AD/A-004  159 

COMPUTER  RECOGNITION  OF  FACIAL  PROFILES 
Gerald  J.  Kaufman 
Ohio  State  University 


Prepared  for: 

Air  Force  Office  of  Scientific  Research 


August  1974 


DISTRIBUTED  BY: 


nhsi^ 


National  Technical  Information  Sonnet 
U.  $.  DEPARTMENT  OF  COMMERCE 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (»7i«n  Data  Rnteted) 


REPORT  DOCUMENTATION  PAGE 


I.  REPORT  NUMBER 

, ■MQ.SR-i.Tfi  " 


2.  GOVT  ACCESSION  NO 


4.  TITLE  (end  Subtill,) 

COMPUTER  RECOGNITION  OF  FACIAL  PROFILES 


7.  AUTHORfa; 

Gerald  J.  Kaufman 


9.  PERFORMING  ORGANIZATION  NAME  AND  AOORESS 


The  Ohio  State  University 
Department  of  Electrical  Engineering 
Columbus,  Ohio  43210 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


3.  BEC4f>IE>T*S  CATALOG  NUMBER 


ooJ^/S$_ 


S.  TYPE  OF  REPORT  6 PERIOD  COVERED 

Interim 


6.  PERFORMING  ORG.  REPORT  NUMBER 


8.  CONTRACT  OR  GRANT  NUMBERS 

AFOSR  71-2043 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  ft  WORK  UNIT  NUMBERS 


61102F 

9769-02 


It.  CONTROLLING  OFFICE  NAME  AND  AOORESS 

Air  Force  Office  of  Scientific  Research  (NM) 
1400  Wilson  Blvd 

Arlington,  Virginia  22209  


14.  MONITORING  AGENCY  NAME  ft  ADDRESSft  I dilterent  Irom  Controlling  Oil  ice) 


12.  REPORT  DATE 


August  1974 


13.  NUMBER  or  PAGES 


ISA 


IS.  SECURITY  CLASS,  (of  this  report) 

UNCLASSIFIED 


ISc.  DECLASSIFICATION/  DOWNGRADING 
SCHEDULE 


IS.  DISTRIBUTION  STATEMENT  (o I this  Report) 


Approved  tor  puDJLic  release;  distnoution  unlimited. 


t7.  DISTPI3UTION  STATEMENT  (of  the  sbstrsict  entered  In  Block  20,  II  dillerent  horn  Report) 


Ift.  SUPPLEMENTARY  NOTES 


Reproduced  by 

NATIONAL  TECHNICAL 
INFORMATION  SERVICE 


U S Department  of  Commerce 
Springfield  VA  *»151 


19.  KEY  WOROS  (Contlnua  on  revere*  aide  it  necessary  end  identity  by  block  number) 


moment  invariants 
feature  vectors 
autocorrelation 
silhouette 
facial  recognition 


weighted  k-nearest  neighbor  decision  rule 
adaptive  training 


20.  ABSTRACT  (Continue  on  reverse  side  it  necessary  and  Identify  by  block  number) 


A system  for  the  recognition  of  human  faces  from  full  profile  silhouettes  is 
described.  The  system  is  adaptively  trained  using  a novel  stack-oriented 
training  procedure  which  is  shown  to  be  quite  effective  in  identifying  those 
feature  vectors  which  are  of  most  importance  in  the  recognition  process.  Thus 
the  training  procedure  generally  produces  authority  files  having  a small 
number  of  entries.  The  feature  vectors  used  are  generated  from  a normalized 
autocorrelation  function  expressed  in  polar  coordinates.  These  feature 
vectors  are  shown  to  be  more  effective  in  the  recognition  process  than  are  the 


DD  , 


FORM 
JAN  73 


1473 


EDITION  OF  I NOV  €5  IS  OBSOLETE 


I 


JMLLASS1ELED_ 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Data  Entered! 


.UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  This  PAGEfH?iW  Data  Entatad) 


20.  Abstract  (Continued) 

moment  invariant  features,  at  least  for  this  problem.  Experiments  are 
described  in  which  the  system  attains  a recognition  accuracy  of  90%  in  a 
10  class  problem  using  12-dimensional  circular  autocorrelation  feature 
vectors.  It  is  shown,  by  further  experiments,  that  these  results  are  no 
worse  than  a human  observer's  accuracy. 


i 


1 ^ 

UNCLASSIFIED 

srruRiTv  n assific.ation  OF  this  PAGSrttTien  D*t«  Entered) 


"How  can  a computer  be  made  to  recognize 
a human  face?  This  question  remains  un- 
answered, because  pattern  recognition  by 
computer  is  still  too  crude  to  achieve 
automatic  identification  of  objects  as 
complex  as  faces." 


J 


Leon  D.  Harmon 
November,  1973 


LI 


ii 


mmmmumummmmmiiimtom 


i 


i 


i 


i 

la 


s 


,1 


ACKNOWLEDGMENTS 


I ^ ould  like  to  express  my  sincere  gratitude  to  my  adviser. 
Professor  Kenneth  .1.  Breeding,  for  his  guidance,  support,  and  under- 
standing. The  basis  for  this  thesis,  the  suggestion  that  the  state 
of  the  art  in  pattern  recognition  was  sufficient  to  enable  a machine 
capable  of  recognizing  human  faces  to  be  built,  was  his.  The  automatic 
training  algorithms  described  in  Chapter  III  and  upon  which  this  work 
hinges  were  also  his.  Professor  Breeding  not  only  restrained  me  from 
traveling  too  far  down  cul-de-sacs,  but  also  listened  patiently  to  a 
multitude  of  ideas  (of  varying  worth)  and  offered  constructive  criti- 
cism of  even  the  least  useful.  He  suggested  several  of  the  experiments 
described  in  this  thesis.  Our  discussions  added  immensely  to  my  know- 
ledge and  understanding  of  the  practical  as  well  as  theoretical  aspects 
of  pattern  recognition. 

The  curiosity  and  support  of  Mr.  Thomas  Southard  must  be  men- 
tioned. Not  only  was  his  optimism  when  things  were  not  going  well 
appretiated,  but  in  explaining  the  work  to  him  I clarified  the  concepts 
to  myjelf. 

?!  would  like  to  thank  my  ten  volunteers  who  submitted  without 
corip'.aint  >:>  my  lights  and  camera.  Their  patience  and  cooperation  made 
this  thesis  possible  an.i  their  contribution  entitles  them  to  at  least 
ment ion  fy  name.  They  are:  Ms.  Carol  Buccllla,  Prof.  Hooshang  Hemami, 

Ms.  Ju'.le  Kaupe.r,  Ms.  Kathy  Mielke,  Ms.  Gloria  Peloso,  Mr.  Gene  Rissanen, 


I \ 


I At  \ I LA  M 


Mr*  Cvai  Ho  Insen,  Mr.  Thomas  Southard,  Prof.  John  Swartz,  and  Mr. 

C.er&ld"f  iKcmtki  . In  ad<ll. tier. , Ma.  '<9 jper , Mr.  Robinson,,  and  Hr.  Southard 
* 

atteetp1:',!'.  the  human  profile  .recognition  t.ask.  described  in  Section  3.1. 

The  won't  f>i  Mir.  Henry  Pageas.  in  the  preparation  of  the  photo- 
graphs ‘A-tisd  in  r.hj n thesis  is  gladly  acmowlecgeci . 

Mi.  Julie  Kauper  was  responsible  for  translating  mv  illegible 
and  decldi -ily  ml-mte  seawl  into  ty>e<l  copy.  Her  efforts  are  most 
grateful  iy  i appre : late  d . 

, This  «o\*k  was  supported  in  part  by  The  Air  Force  Office  of 

Scientific  e starch  under  Grant  €P-ii?OSR-71-2048. 


TABLE  OF  CONTENTS 


ACKNOWLEDGMENTS lii 

LIST  OF  TABLES vii 

LIST  OF  FIGURES ix 

Char  ter  Page 

I INTRODUCTION  1 

1.1  Prob.'em  Formulation 1 

1.2  Surve^  of  Facial  Recognition  . 4 

IT  PATTERN  RECOGNITION  TECHNIQUES  11 

2.1  Introduction 11 

2.2  System  Description  16 

2.3  Experimental  Procedure  ...  17 

2.4  Image  Filtering  .....  18 

2.5  Binary  Image  Transformations  21 

2.6  Classification  Algorithms  . 32 

III  FACIAL  RECOGNITION  AND  AUTOMATIC  TRAINING  ...  37 

3.1  Facial  Profile  Recognition  37 

3.2  Automatic  Training  .....  51 

3.3  Two  Training  Algorithms 52 

3.4  An  Experimental  Comparison  54 

3.5  A Large  Sample  Size  Experiment 58 

IV  SYSTEM  OPTIMIZATION . 66 

4 . ! Introduction 66 


!^^^fflKE5!$S!^!SS3S3SSSSK5®S^^^mc^^!»'®»  -^«mmmb  jBsci<h.*t.£bM,us..A.  ,Q@&ffi&il&-£!&A;  JESS8MM? 


HH 


f] 


3k 


Table* 

3.1 

3.2 

3.3 
3. A 

3.5 

3.6 

3.7 

3.8 

3.0 

3.10 

3.11 

3.12 

3.13 
3.1A 

4.1 


4.2 


4.3 


4.4 


4.5 


LIST  OF  TABLES 

Page 

Facial  Profile  Recognition  Experiment  1 Results  ...  39 

Facial  Profile  Recognition  Experiment  2 Results  ...  39 

Facial  Profile  Recognition  Experiment  3 Results  ...  41 

Facial  Profile  Recognition  Experiment  4 Results  ...  41 

Facial  Profile  Recognition  Experiment  5 Results  ...  42 

Typical  Moment  Invariant  Feature  Vectors 46 

Moment  Invariants  Recognition  Experiment  1 Results.  . 47 

Moment  Invariants  Recognition  Experiment  2 Results.  . 47 

Moment  Invariants  Recognition  Experiment  3 Results.  . 48 

Human  Facial  Profile  Recognition  Accuracy  50 

Sort  Algorithm  Recognition  Accuracies  55 

Sort  Algorithm  Recognition  Accuracies  per  Class  ...  57 

Activity  Sort  Rule  on  a Large  Training  Set  59 

Training  Algorithm  Statistics  61 

Circular  Autocorrelation  Feature  Vectors  under  Image 
Rotation  and  Size  Change  for  9"  x 12"  Rectangle  ...  70 

Recognition  Accuracy  for  Selected  Circular  Auto- 
correlation Function  Parameters  72 

Recognition  Accuracy  per  Class  for  Selected  Circular 
Autocorrelation  Function  Parameters  72 

Typical  Feature  Vectors  for  Selected  Circular  Auto- 
correlation Function  Parameters  73 

Feature  Vectors  for  Circular  Autocorrelation 

Function  Parameters  am  ■ 1/2,  M-l,  N»12 75 

vii 


>c 


f t 1 


LIST  OF  TABLES  (Continued) 


Table 


4.6  Weight  Function  Comparison  for  the  Distance 

Weighted  k-Nearest  Neighbor  Rule  82 


4.7  Kecognition  Accuracy  for  Two  Classifier  Types  ....  82 


4.8  Recognition  Accuracy  per  Class  for  Two  Classifier 

Types  82 


4.9  Feature  Vector  Component  Means  and  Standard 

Deviations 8? 


v X* 


I 


LIST  OF  FIGURES 


Figure  Page 

2.1  The  Basic  TVo-Diraensional  Pattern  Recognition  Machine  . 12 

2.2  The  Nearest  Neighbor  Rule  Decision  Surface 34 

3.1  Recognition  Accuiac?  vs.  Authority  File  Depth  43 

3.2  Nuraber  of  Vectors  in  Authority  Files  vs.  Input 

Samples 63 


ix 


CHAPTER  I 


INTRODUCTION 


1.1  Problem  Formulation 

The  use  of  computers  for  the  recognition  of  two-dimensional 
images  has  been  the  subject  of  theoretical  and  experimental  research 
for  over  a decade  [1,2, 3,4].  Originally  spurred  by  the  problem  of 
character  recognition  for  computer  input,  researchers  have  recently 
begun  [5,6,7]  to  branch  out  and  consider  the  recognition  of  wther, 
more  compler,  two-dimentional  images.  This  thesis  describes  an  attempt 
to  apply  a portion  of  the  large  body  of  pattern  recognition  theory  to 
the  problem  of  machine  recognition  of  human  faces.  Such  a machine 
could  have  obvious  applications  in  many  personal  identification  or 
verification  roles.  Applications  in  law  enforcement,  credit  verifi- 
cation, security  systems,  and  surveilance  come  to  mind.  This  thesis 
is  mostly  experimental  in  nature,  its  purpose  being  to  select  the 
best  pattern  recognition  techniques  for  the  problem  and  assemble  them 
into  a working  system. 

The  problem  may  then  be  stated  a3  follows:  To  d ionstrate  that  a 

system  capable  of  recognizing  humans  from  their  facial  (such  as 

would  be  obtained  by  a television  camera)  in  real  time  with  an  acceptably 
low  error  rate  is  posrible  using  presently  available  hardware  and  pattern 
recognition  techniques.  The  ultimate  goal  of  this  work  might  then  be 

1 


2 


to  have  a television  camera  viewing  the  entrance  to  a restricted  area 
with  the  video  output  fed  to  a computer.  If  the  computer  is  able  to 
identify  faces,  then  it  can  perform  a table  look-up  to  find  if  a person 
requesting  entrance  is  authorized  and  take  appropriate  action  (e.g., 
opening  the  door,  or  calling  the  security  guard).  Since  this  thesis 
is  concerned  only  with  demonstrating  that  such  a machine  is  possible, 
existing  hardware  was  used  and  the  problem  was  simplified  as  much  as 
possible.  The  details  of  the  problem  follow. 

It  was  decided  to  define  a ten  class  problem,  that  is,  ten 
people  were  chosen  to  comprise  the  input  set.  This  number,  although 
too  small  for  most  practical  applications,  does  provide  a simple  start- 
ing point.  The  small  number  of  classes  allows  the  data  generated  to 
be  analyzed  without  time  consuming  calculations  and  also  allows  the 
input  data  to  be  gathered  in  a reasonable  period  of  time.  By  character 
recognition  standards,  a ten  class  problem  may  not  be  xarge  enough  to 
provide  a fair  test  of  the  classification  system.  The  work  of  Goldstein, 
Harmon,  and  Lesk  [8]  indicates,  however,  that  for  facial  recognition,  a 
ten  class  problem  may  be  sufficient  to  provide  some  indication  of  how 
the  system  wo-ild  respond  to  a larger  problem.  Again,  it  is  not  the 
intent  of  this  thesis  to  provide  a practical  pattern  recognition  system 
for  human  faces,  only  to  demonstrate  its  feasibility.  The  ten  subjects 
may  be  broken  down  into  the  following  categories:  four  were  female, 

six  were  male,  two  wore  glasses  with  dark  plastic  frames,  two  wore 
wire  rimmed  glasses.  The  two-dimensional  ima^  ; was  obtained  from  a 
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television  camera  output  interfaced  directly  to  the  computer  (9J. 


The  video  signal  was  quantized  to  only  two  levels,  black  or  white. 


with  the  quantization  threshold  level  set  by  the  operator.  Because 


of  this  severe  quantization  it  was  felt  (after  some  preliminary  exper- 


imentation) that  the  profile  view  offered  the  most  information  and  maxi- 


mum repeatability  from  image  to  image.  Profile  views  were  thus  used 


exclusively  in  this  work.  A total  of  120  Images,  12  per  subject. 


were  taken  and  stored  on  magnetic  tape  for  processing.  These  120 


Images  comprised  the  input  set.  The  machine  was  designed  to  classify 


an  Inpur  image  as  belonging  to  one  of  the  ten  classes  and  output  its 


classification.  The  possibility  of  an  input  image  not  belonging  to 


one  of  the  ten  classes  was  ignored  in  the  interest  of  simplicity. 


The  rest  of  this  chapter  contains  a survey  of  the  work  done  in 


the  area  of  human  face  recognition,  both  by  computer  and  human  recogni- 


zers. Chapter  II  provides  a summary  of  the  more  prominent  two-dimensional 


pattern  recognition  techniques,  with  emphasis  on  the  techniques  inves- 


tigated in  this  work.  Chapter  III  discusses  the  results  of  machine 


and  human  recognition  on  the  120  facial  profiles.  Chapter  III  also 


describes  two  algorithms  used  to  train  the  pattern  recognition  system 


and  experiments  to  verify  their  expected  operation.  Chapter  IV  gives  a 


description  of  the  series  of  experiments  performed  to  optimize  the  pattern 


recognition  techniques  for  the  facial  recognition  problem.  Chapter  V 


concludes  the  thesis  with  a 3uJ#raary  of  the  results  and  a discussion  of 


areas  for  further  research. 


1.2  Survey  of  Facial  Recognition 

Virtually  all  of  the  previous  work  on  the  problem  of  facial 
recognition  has  dealt  with  syntactical  information  as  may  be  found  in 
a set  of  roughly  quantized  descriptive  features  such  as  ear  length,  lip 
thickness,  chin  profile,  etc.  [10,11].  No  attempt  to  use  statistical 
recognition  techniques  [12]  for  the  identification  of  faces  has  been 


found . 


The  tirst  work  with  syntactic  recognition  is  t n*?  of  Bertillon 


[11]  in  the  classification  of  facial  features  for  criminological  appli- 
cation. Although  later  superceded  by  his  work  in  fingerprint  classifi- 
cation, his  facial  descriptions  were  meticulously  done  and  included 
sets  of  descriptive  names  still  used  by  law  enforcement  agencies.  A 
■ore  modern  discussion  may  be  found  in  Allen  [13]. 

The  closest  to  an  automatic  recognition  system  is  a man-machine 
interactive  approach  described  by  Goldstein,  Harmon  and  Lesk  [10] , and 
Harmon  [14].  This  system  used  a 21-dimensional  feature  vector.  The 
vector  components  were  descriptive  features  quantized  on  a scale  of  one 
to  five.  Some  examples  of  the  features  are:  mouth  width  (short  to 

long) , cheeks  (sunken  to  full) , and  hair  length  (short  to  long) . The 
input  set  consisted  of  255  faces  and  the  values  of  the  feature  vector 
components  were  determined  by  the  average  of  assignments  made  by  a panel 
of  ten  human  observers  from  three  photographs  (front,  3/4,  and  side 
views)  of  each  face.  The  feature  vectors  were  entered  into  the  computer 
and  a sorting  algorithm  used  to  order  the  vectors  from  best  to  worst 
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match  for  some  Input  description.  The  input  description  was  obtained 
by  first  having  the  operator  enter  the  most  "conspicuous"  features  of 
the  subject  to  be  identified  and  then  allowing  the  computer  to  request 
features  that  would  separate  the  vectors  at  the  top  of  the  rank-ordered 
list.  The  procedure  was  stopped  after  ten  of  the  21  feature  vector 
components  had  been  entered.  Using  this  technique  a recognition  accuracy 
of  70%  was  achieved. 

Kaya  and  Xobayashl  (15)  suggested  that  a set  of  geometric  para- 
meters could  be  used  to  describe  a human  face.  They  defined  nine  para- 
meters that  were  Euclidian  distances  between  specified  points  on  the 
front  view  of  a face.  Some  typical  parameters  are  height  of  lips,  dis- 
tance between  upper  lip  and  nose,  distance  between  lower  lip  and  chin, 
and  distance  between  corners  of  the  eyes.  All  parameters  were  normalized 
by  the  nose  length  to  provide  size  invariance.  These  parameters  were 
measured  from  a set  of  photographs  of  62  people.  A serial  classification  or 
tree  search  algorithm  was  proposed  in  which  each  parameter  is  used  in 
turn  to  reduce  the  population  until  only  one  face  remained.  A theoreti- 
cal analysis  using  assumed  parameter  probability  distributions  and  neg- 
ligible noise  showed  that  the  algorithm  could  achieve  a recognition 
accuracy  of  90%  within  a population  of  15,000  faces. 

The  other  investigations  of  the  recognition  of  human  faces  are 


concerned  with  recognition  by  humans.  Goldstein,  Harmon,  and  Lesk 
(8)  describe  a series  of  experiments  leading  to  22  subjective  features 
of  human  faces  that  are  useful  for  recognition.  The  sample  set  consisted 
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of  Harmon's  255  faces,  and  ten  "jurors"  were  used  to  assign  numerical 


values  to  the  features.  The  22  features  were  selected  from  a set  of  3A, 


the  selection  criteria  being  large  variance  over  the  sample  set  and 


small  variance  over  the  jurors.  The  22  features  were  tested  for  cor- 


relation and  found  to  be  largely  independent  by  several  tests.  A 


model  for  classification  by  humans  was  proposed  in  which  the  features 


were  ordered  from  most  "extreme"  to  least  "extreme".  Each  feature  was 


selected  in  turn  and  faces  with  a feature  value  close  enough  (by  some 


constant  threshold)  to  the  specified  feature  value  were  kept,  while 


the  rest  were  rejected.  The  number  of  features  used  to  narrow  the 


sample  set  to  one  using  this  technique  was  found  to  depend  logarithmically 


on  the  size  of  the  sample  set  and  an  equation  describing  this  dependence 


was  derived.  Approximately  six  features  for  the  255  samples  were  used 


and  it  was  predicted  using  the  previously  derived  equation  that  about 


1A  features  would  be  necessary  for  a sample  size  of  A x 10  . The  com- 


puter model  was  verified  with  an  experiment  using  human  recognizers  who 


obtained  a recognition  accuracy  of  53%. 


Goldstein  and  Mackenberg  [16]  experimented  with  facial  recognition 


by  humans  given  only  a portion  of  the  face.  The  recognizer  was  required 


to  identify  a known  person  from  a photograph  which  had  been  masked  so 


that  only  a portion  of  the  face  was  visible.  This  study  indicated  that 


for  recognition  the  upper  parts  of  the  face  are  more  important  than  the 


lower  parts.  Recognition  accuracy  was  generally  better  for  pictures 


that  contained  several  of  Harmon's  22  features  than  for  pictures  that 


I 
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Harmon  [17]  reported  an  investigation  into  the  min; mum  amount 
of  information  necessary  for  facial  recognition  by  humans.  Fourteen 
front  view  photographs  were  digitized  with  a flying-spot  scanner  and 
stored  on  magnetic  tape.  The  high  quality  image  (102A  x 1024  bits) 
was  fragmented  into  n x n squares  and  each  square  was  assigned  a 
brightness  equal  to  the  average  of  the  brightness  values  of  all  the  ori- 
ginal samplas  within  the  square.  Brightness  was  quantized  to  8 or  16 
levels.  The  final  picture  thus  obtained  contained  high  frequency 
noise  corresponding  to  the  block  edges  and  although  the  energy  content 
of  the  high  frequencies  was  small,  recognition  was  improved  by  low 
pass  filtering.  It  was  hypothesized  that  this  was  due  to  the  eye’s 
sensitivity  to  straight  lines  and  regular  geometric  shapes.  With  the 
facial  image  quantized  to  a 16  x 16  grid  with  8 grey  levels,  an  average 
recognition  accuracy  of  48 7.  was  achieved  by  28  human  recognizers.  The 
results  also  indicated  that  the  grid  placement  on  the  photographs  may 
be  critical  for  optimum  recognition.  Harmon  and  Julesz  [18]  used  the 
same  technique  to  Investigate  the  effect  of  noise  on  the  recognition 
of  faces.  They  found  that  random  noise  at  frequencies  close  to  the 
spatial  quantization  frequency  "masked"  the  Information  and  raised  the 
amount  of  information  necessary  for  recognition,  while  random  noise  of 
frequencies  greater  than  two  octaves  removed  from  the  quantization  fre- 
quency had  little  effect  upon  recognition. 

Hochberg  and  Galper  [19]  tested  the  perception  of,  and  memory 
for,  faces  by  humans.  They  found  that  recognition  accuracy  was 
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significantly, greater  for  upright  than  for  Inverted  photographs  of 
human  faces.  Faces  that  were  learned  from  upright  photographs  seemed 
to  be  tied  to  orientation,  while  faces  learned  from  inverted  photographs 
did  not  seem  to  be  tied  3trongly  to  orientation.  Bradshaw  and  Wallace 
[20]  used  an  Identi— Kit  to  obtain  ir .formation  on  how  humans  recognize 
faces.  Their  results  Indicated  that  huntans  classify  faces  with  a "serial 
self-terminating"  procedure,  that  is,  each  facial  feature  is  considered 
in  turn  until  an  identification  is  made,  at  which  point  the  procedure 
stops.  They  found  no  evidence  for  parallel  processing,  or  Gestalt 
perception  [21]. 

If  pattern  recognition  is  one  side  of  a coin,  then  pattern 
generation  is  the  other  (and  usually  more  tractable)  side.  Two  efforts 
at  the  computer  generation  of  faces,  although  not  terribly  germane  to 
the  present  work,  are  interesting  enough  to  be  mentioned.  Gillenson 
[22]  designed  a system  for  use  by  "non-artists"  to  reconstruct  a line 
drawing  of  the  front  view  of  a face  on  a cathode  ray  tube  display. 

The  system  was  interactive  with  the  user  and  consisted  of  a library 
of  stored  features  with  routines  to  distort  the  features  to  obtain  a 
better  likeness.  Parke  [23]  developed  a system  to  draw  high  quality 
half-tone  renderings  of  a human  face  in  three-dimensional  perspective. 

The  skin  surface  was  approximated  by  a net  of  polygons  and  a shading 
algorithm  was  used  to  give  a continuous  curved  appearance.  The  face 
was  represented  by  a matrix  describing  the  polygon  vertices,  and  anima- 
tion was  achieved  by  interpolating  between  the  vertices'  positions  for 
two  end  expressions. 
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In  addition  to  the  above,  some  qualitative  work  may  be  briefly 
mentioned.  The  Identi-Kit  is  the  best  known  semi-automatic  system  for 
generating  line  drawings  of  the  front  view  of  a human  face  and  is  used 
mainly  by  police  departments.  The  Identi-Kit  uses  clear  plastic  over- 
lays each  having  a single  facial  feature  to  form  a composite  picture. 

There  are  several  different  overlays  for  each  feature,  so  that  by  choosing 
the  appropriate  features  a reasonable  facsimile  of  a specific  human  face 
may  be  produced.  There  are  several  other  similar  systems  in  use,  and  a 
summary  of  their  differences  and  operation  may  be  found  in  Gillenson  [22]. 

Wall  [24]  indicates  that  criminological  eyewitness  identification 
is  virtually  always  subject  to  error  unless  the  witness  knew  the  subject 


in  advance  of  the  act.  The  factors  of  fright,  similarity  of  faces, 
and  poor  and  fast  viewing  conditions  tend  to  make  accurate  recognition 
difficult.  Although  artists  have  been  drawing  human  faces  for  all  of 
history,  there  is  little  in  artistic  literature  discussing  the  recogni- 
tion of  faces.  Willis  [25]  mentions  the  different  variations  of  facial 
features  and  how  they  may  be  realistically  represented  in  drawings. 

The  above  survey  points  out  the  basic  differences  between  the 
previous  work  in  facial  recognition  and  the  work  described  in  this  thesis. 
This  thesis  describes  the  development  of  a completely  automatic  and 
real  time  facial  recognition  system.  This  system  uses  statistical  pat- 
tern recognition  techniques  to  classify  facial  profiles  obtained  from  a 
television  camera  input  Previous  systems  have  been  at  best  man-machine 
interactive  with  the  facial  features  described  to  the  machine  by  the 
human  observer  from  a photograph.  Previous  work  has  also  depended  upon 


syntactic  pattern  recognition  with  no  investigation  into  the  applicabili- 
ty of  statistical  pattern  recognition  to  the  problem.  Finally,  the 
facial  recognition  system  described  in  this  thesis  provides  the  highest 
recognition  accuracy  of  any  system  known  to  the  author. 
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CHAPTER  II 


PATTERN  RECOGNITION  TECHNIQUES 
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In  this  chapter  the  pattern  recognition  aspects  of  this 
research  are  discussed.  A short  review  of  the  two-dimensional  pat- 
tern recognition  problem  is  provided,  followed  by  a description  of 
the  hardware  and  some  of  the  software  involved  in  this  particular 
system. 

2.1  Introduction 

Although  the  terra  pattern  recognition  encompasses  far  more  than 
picture  recognition,  in  the  interest  of  conciseness,  only  the  tech- 
niques associated  with  two-dimensional  image  recognition  and  germane 
to  this  work  will  be  mentioned.  Most  designers  of  digital  two-dimensional 
pattern  recognition  machines  seem  to  use  a rather  standard  structure. 

This  basic  form  is  shown  in  Figure  2.1.  Most  workers  in  the  image 
recognition  area  ignore  the  problem  of  target  acquisition.  The  scene 
presented  to  the  machine  contains  only  the  pattern  to  be  recognized, 
a simplification  which  may  not  be  realistic  outside  the  laboratory 
environment . 

The  input  device  to  the  image  recognition  system  may  be  non- 
existent (i.e.,  the  pattern  is  digitized  by  hand  and  entered  through  a 
standard  peripheral),  an  array  of  photocells,  a flying  spot  scanner,  or 
television  camera.  The  input  device  is  usually  responsible  for  the 
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image  digitization.  Prefiltering  of  the  image  may  or  may  not  be  used. 
Prefiltering  is  used  to  remove  any  extraneous  noise  from  the  image, 
and,  in  some  cases,  smooth  the  image  boundary. 

The  process  of  feature  extraction,  or  image  transformation,  is 
used  to  reduce  the  amount  of  information  to  be  handled  by  the  classifier 
by  extracting  "features"  from  the  image  that  are  in  some  sense  repre- 
sentative of  the  image.  It  is  desirable  that  these  features  be  in- 
variant with  respect  to  image  position  (translation),  size  change,  and 
rotation,  so  that  an  image  which  is  modified  by  any  combination  of  the 
above  three  transformations  will  be  classified  the  same  as  the  original 
image.  Several  techniques  for  feature  extraction  from  two-dimensional 
images  have  been  developed.  The  earliest  used  are  the  related  tech- 
niques of  cross-correlation,  template  matching,  and  matched  filtering 
[1,26].  These  methods  compare  a stored  pattern  against  the  input  image 
and  produce  a single  metric  which  is  related  to  the  'goodness  of  match'. 
These  techniques  have  several  disadvantages.  Since  comparison  patterns 
(templates)  must  be  stored,  the  machine  memory  size  must  be  very  large 
even  for  simple  recognition  problems  consisting  of  only  a few  classes 
with  small  image  sizes  (large  quantization  intervals).  In  general  these 
techniques  are  not  size  or  rotation  invariant. 

Another  technique  is  that  of  geometric  features  extracted  by 
local  neighborhood  operations  in  array  processors.  First  proposed  by 
Unger  [27,28],  the  work  has  been  extended  bv  several  by  several  other 
researchers  [29,3^].  The  major  problem  with  this  technique  seem3  to  be 
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its  lack  of  generality,  although  it  has  been  shown  to  work  for  simple 
geometric  shapes  [31],  it  has  not  been  successfully  applied  to  the  re- 
cognition of  complex  objects.  The  features  are  usual ''y  not  invariant 
with  respect  to  size  or  rotation. 

A class  of  feature  extraction  techniques  that  seems  to  be  gain- 
ing support  at  present  are  the  various  frequency  domain  algorithms. 

Among  these  are  the  impulse  response  filters  [32],  the  discrete  Fourier 
transform,  and  the  Walsh/Hadamard/Haar  transforms.  The  impulse  response 
filters  are  often  used  in  the  analysis  of  aerial  photographs  and  terrain 
classif Ication  [33]  to  detect  small  areas  of  interest  such  as  orchards, 
oil  tank  farms,  and  railroad  yards.  The  filters  are  simply  a distribu- 
tion of  integer  weights  on  a grid  such  that  when  the  appropriate  feature 
(e.g.,  a straight  line)  is  centered  under  the  grid  the  3um  of  all  weights 
times  their  associated  optical  densities  is  above  a threshold.  The 
discrete  Fourier  transform  has  given  good  results  on  a number  of  pattern 
recognition  problems  [6],  The  use  of  digital  transform  domains  such 
as  the  Walsh-Haaamard  and  Haar  has  been  proposed  [34],  These  techniques 
generally  suffer  from  the  same  lack  of  invariance  as  the  other  methods 
discussed  above,  i.e.,  size  and  rotation,  although  Richard's  technique 
of  using  Fourier  descriptors  of  the  boundary  curve  of  an  image  [7]  is 
both  size  and  rotation  invariant. 

The  last  class  of  techniques  is  that  of  arbitrary  transformations. 
These  are  transforms  designed  to  be  invariant  with  respect  to  transla- 
tion, size  change,  and  rotation.  Circular  auto-correlation  and  moment 
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invariants  [35]  are  in  this  class.  Moment  invariants  have  been  used 
by  Dudani  [5]  with  excellent  results.  Circular  auto-correlation  was 
developed  during  this  research  and  will  be  described  below. 

The  final  process  in  a pattern  recognition  system  is  classifi- 
cation. Usually  the  unknown  feature  vector  is  compared  against  a list 
of  vectors  for  each  class  (authority  files)  and  some  metric  for  each 
clas3,  corresponding  to  the  probability  that  the  unknown  vector  belongs 
to  that  class,  is  computed.  The  unknown  vector  is  then  assigned  to 
the  class  to  which  it  nas  the  highest  probability  of  belonging. 

A probabilistic  classifier  of  the  Bayes  type  will  theoretically 
give  the  highest  possible  recognition  accuracy  on  a given  set  of  feature 
vectors  [36],  The  drawback  to  the  Bayes  classifier  is  the  need  to  know 
the  a priori  probability  density  functions  for  the  occurrence  of  a 
vector  and  the  occurrence  of  a vector  given  that  it  belongs  to  a speci- 
fied class.  These  functions  are  seldom,  if  ever,  known.  The  experi- 
mental determination  of  such  functions  requires  large  sample  sizes 
usually  not  available  to  the  researcher.  Without  accurate  probability 
density  functions  the  Bayes  classifier  may  not  perform  as  well  as  a 
non-probabilistic  classifier  [5]. 

A simple  piecewise-l inear  classifier  is  the  nearest  neighbor 
classifier.  Tr.  this  method  the  unknown  feature  vector  is  assigned  to 
the  class  to  which  it  has  the  smallest  Euclidian  distance.  It  has  been 
shown  that  this  technique  will  produce  a recognition  accuracy  no  worse 
than  one-half  that  of  an  optimal  (Bayes)  classifier  [37]. 
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A more  complex  and  non-linear  classifier  is  the  distance-weighted 
k-nearest  neighbor  rule  [5,37].  This  technique  assigns  weights  to  the 
k nearest  neighbors  from  the  authority  file  of  the  unknown  vector.  The 
weights  assigned  to  the  k nearest  neighbors  are  summed  vrilth  respect  to 
class  and  the  unknown  vector  assigned  to  the  class  with  the  highest 


weight . 


The  vectors  used  in  a classifier  may  be  either  normalized  or 


unnormallzed.  One  method  of  normalization  is  to  divide  all  vectors  by 
their  length,  so  that  only  vector  angles  determine  classification. 
Another  method  is  to  subtract  from  each  component  of  the  vector  that 
component's  mean  and  then  divide  by  its  standard  deviation  [33].  The 
type  of  vector  normalization  used,  if  any,  depends  upon  both  the  feature 
extraction  and  classification  algorithms. 


2.2  System  Description 

The  hardware  used  in  this  experiment  will  now  be  described. 

The  computer  used  to  implement  the  recognition  algorithms  is  the  Ohio 
State  University  Electrical  Engineering  Deportment's  PDP-9.  This  par- 
ticular machine  configuration  contains,  in  addition  to  the  standard 
peripherals,  a Tektronix  611  bistable  storage  cathode  ray  tube  display. 
Connected  to  the  PDP-9  is  a closed  circuit  black  and  white  television 
camera.  The  interface  [9]  which  connects  the  camera  to  the  computer 
contains  a circuit  to  threshold  the  video  input  signal  from  the  camera 
to  a binary  output.  The  threshold  level  is  adjustable  and  the  binary 
video  signal  is  fed  to  a television  monitor  so  that  the  operator  may 
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adjust  the  threshold  level  to  compensate  for  changing  light  levels  and 
obtain  the  desired  Image.  The  Images  obtained  from  this  system  are 
360  x 240  bit  binary  arrays  and  are  stored  in  the  machine  as  20  x 240 
18*-bit  word  arrays.  The  interface  actually  reads  only  about  180  bits 
per  scan  line,  which  gives  a distorted  image  when  displayed  on  an  array 
with  equal  horizontal  and  vertical  bit  spacing.  Art  aspect  ratio  cor- 
rection routine  is  therefore  employed  [9}  to  approximately  double  the 
number  of  bits  horizontally,  which  results  in  an  undistorted  image. 

2.3  Experimental  Procedure 

The  procedure  used  to  obtain  a facial  profile  consisted  of 
seating  the  subject  in  front  of  a black  backdrop  facing  perpendicular 
to  the  optical  axis  of  the  television  camera.  The  camera  position  was 
adjusted  so  that  the  subject's  profile  filled  the  monitor  screen.  The 
lighting,  video  threshold,  and  camera  aperture  were  selected  to  obtain 
a reasonable  replica  on  the  monitor.  In  this  svstem,  subjects'  flesh 
appeared  white  on  the  monitor  and  the  hair  and  background  were  black. 
Appendix  A contains  samples  of  negatives  of  the  facial  profiles  obtained. 
Subjects  8 and  10  wore  horn  rimmed  and  black  frame  glasses,  respectively, 
and  as  can  be  seen  from  Appendix  A,  these  glasses  were  thresholded  as 
black.  Subjects  5 and  6 wore  wire  rimmed  glasses,  which  are  not  visible 
on  the  thresholded  image. 

Once  a satisfactory  image  was  obtained  on  the  monitor,  the  image 
was  stored  on  a disk  file  and  later  transferred  to  DECtape.  The  idea 
here  was  to  build  a large  file  of  facial  Images  so  that  it  would  not  be 
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necessary  to  have  the  subjects  present  every  time  an  Identification 
system  was  to  be  tested.  With  this  file  of  Images,  it  was  then  possible 
to  divide  it  into  two  distinct  sets,  one  for  training  the  recognition 
system  and  one  for  testing  the  recognition  accuracy  with  unknown  profiles. 

It  was  the  author's  intention  at  the  beginning  of  this  work  to 
obtain  one  image  per  day  for  each  of  the  ten  subjects.  It  was  felt 
that  this  procedure  would  give  a set  of  images  with  'real'  day  to  day 
variations  and  thus  be  a reasonable  data  set  upon  which  to  "base  a pattern 
recognition  system.  This  dream  was  shattered  by  the  practical  problem 
of  scheduling  computer  time  and  matching  personal  schedules.  In  the 
end,  most  of  the  images  were  obtained  in  two  sittings  of  six  images 
each  per  subject,  for  a total  of  120  images.  This  total  data  set  may 
be  divided  into  two  sets  of  60  Images,  6 images  per  subject.  The  first 
60  Images  were  obtained  with  changes  in  lighting,  video  threshold  and 
camera  aperture  settings,  and  subject  position  in  an  attempt  to  intro- 
duce variations  in  samples  within  a given  class.  The  second  set  of  60 
images  was  obtained  with  fixed  lighting,  video  threshold,  camera  aperture, 
and  subject  position  in  an  attempt  to  reduce  the  sample  variation  to  a 
minimum, 

2.4  Image  Filtering 

The  pattern  recognition  system  described  In  this  thesis  uses  two 
image  filtering  routines.  The  first  is  a 'prefilter'  routine  whose 
purpose  is  to  remove  high  frequency  noise  on  the  image  boundary.  This 
noise  is  caused  by  several  factors.  First  is  the  quantization  noise 
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Inherent  in  any  digitization  process.  Second  is  the  noise  generated 
by  the  aspect  ratio  correction  mentioned  previously.  Because  the 
number  of  horizontal  bits  is  almost  doubled,  steps  in  the  image  boun- 
dary two  cells  wide  tend  to  occur.  Third  is  the  noise  generated  by 
inherent  instabilities  in  the  video  level  and  threshold  circuits.  The 
video  signal  from  the  television  camera  is  not  very  stable  and  conse- 
quently it  is  difficult  to  obtain  a smooth  boundary  on  the  thresholded 
image  for  a light  to  dark  transition  in  the  viewed  scene.  This  effect 
is  most  apparent  in  the  hairline  area  of  the  facial  profiles,  as  can  be 
seen  from  the  pictures  contained  in  Appendix  A.  The  second  filter  is 
used  to  extract  the  front  edge  of  the  facial  profile.  This  filter  re- 
jects the  larger  image  variations  caused  by  the  highly  variable  hair- 
line area  described  below. 

The  prefilter  is  implemented  with  binary  array  processor  sim- 
ulation routines  [39].  Briefly,  the  array  processor  structure  of  this 
simulation  is  a synchronous  two-dimensional  array  of  storage  cells. 

Each  cell  contains  one  bit  of  a binary  image.  The  next  state  of  each 
cell  is  a binary  function  of  its  present  state  and  the  states  of  the 
eight  "neighbor"  cells  adjoining  it  (the  simulation  routines  use  a rec- 
tangular tessellation  [40]).  Threshold  logic  is  used  to  implement  the 
binary  function.  The  algorithm  for  the  prefilter  is  to  first  set  to  zero 
any  cell  whose  eight  neighbors  are  not  all  one,  thus  removing  "noise" 
cells.  Then  any  cell  that  is  zero  and  has  at  least  one  neighbor  that 
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is  one  on  a Von  Neumann  neighborhood  [41]  is  changed  to  one.  Finally, 
any  zero  cell  that  has  at  least  one  neighbor  that  is  one  on  a Moore 
neighborhood  [41]  is  changed  to  one.  The  last  tvo  operations  have  the 
effect  of  smoothing  the  image  boundary  as  well  as  filling  one  and  two- 
cell-wide gaps  in  the  image.  The  boundary  smoothing  is  particularly 
useful  in  this  system  because  the  aspect  ratio  correction  [9]  required 
by  the  television  camera  Interface  tends  to  generate  two-cell-wide  steps 
in  the  input  image. 

The  edge  extraction  filter  is  used  to  remove  the  back  of  the 
facial  profile.  It  was  determined  early  In  this  work  that  the  televi- 
sion camera  and  threshold  circuit  used  to  obtain  the  binary  images  gave 
drastically  different  images  in  the  hairline  and  chin-neck  areas  for 
slight  lighting  changes.  It  was  also  felt  that  the  hairline  and  collar 
areas  would  be  highly  variable  on  a day  to  day  basis.  For  these  two 
reasons  it  was  decided  to  mask  all  but  the  front  of  the  facial  profile 
and  use  only  this  edge  for  classification.  The  edge  extraction  i3 
accomplished  by  duplicating  the  input  image,  shifting  the  duplicated 
image  backwards,  and  then  setting  to  zero  all  image  cells  covered  by 
one  cells  of  the  duplicated  array.  The  second  array  is  then  moved  up 
and  down  with  the  image  cells  overlayed  by  the  one  cells  of  the  second 
array  being  set  to  zero  after  each  move.  The  distance  of  all  translations 
is  given  by 

d - K A (2-1) 


where:  d is  the  translation  distance  of  the  duplicated  array 

A is  the  total  image  area,  l.e.,  the  number  of  one  cells 
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K is  a constant  empirically  determined. 

This  relation  tends  to  make  the  routine  size  independent  (i.e.,  the 
same  height-to-width  ratio  of  the  filtered  image  is  maintained).  This 
is  not  a very  accurate  method  of  obtaining  size  independence,  since 
two  images  of  a subject  may  have  significantly  different  areas,  due  to 
the  input  system  flaw  mentioned  above  (see  Appendix  A) . The  method 
did,  however,  seem  to  be  adequate  for  this  system.  This  filter  routine 
is  certainly  not  the  only,  or  even  best,  method  of  obtaining  this 
function,  since  it  leaves  the  forehead  area  dependent  upon  the  hairline 
(see  Appendix  B) . It  does  have  the  advantages  of  being  simple,  fast, 
translation  and  size  invariant,  and  useable  for  any  rotation  angle. 

The  binary  array  processor  simulation  routines  used  in  this  pattern 
recognition  scheme  could  have  been  used  to  obtain  an  image  edge  simply 
by  setting  to  zero  all  cells  with  the  proper  neighborhood.  For  example, 
to  obtain  the  right  edge  of  an  image  any  cell  whose  righthand  neighbor 
is  one  should  be  set  to  zero.  Thi6  method  of  edge  extraction  was  not 
used  for  two  reasons.  It  was  felt  that  the  binary  image  transformations 
would  be  less  susceptible  to  noise  if  they  were  presented  with  a solid 
image  rather  than  a single-cell-wide  edge.  Also,  this  edge  extraction 
technique  can  only  extract  edges  at  45°  increments,  which  is  undesirable 
if  the  system  is  to  be  expanded  to  work  with  arbitrary  image  rotations. 


2.5  Binary  Image  Transformations 

This  research  uses  two  of  the  image  transformation  techniques 


previously  discussed,  correlation  and  moments.  Correlation  was  chosen 


mainly  for  its  ease  of  computation  and  its  ability  to  be  made  translation 
and  size  invariant,  and  have  predictable  and  easily  computed  variation 
under  rotation.  Although  correlation  does  not  presently  seem  to  be 
a popular  technique  with  researchers  in  the  pattern  recognition  area, 
it  has  been  used  by  ’Horwitz  and  Shelton  [42]  on  a simple  character  re- 
cognition experiment  with  good  results.  Moment  invariants  were  chosen 
because  of  the  excellent  results  Dudanl  [5]  obtained  on  aircraft  recog- 
nition with  this  technique.  It  was  felt  that  moments  would  provide  a 
reasonable  benchmark  to  compare  correlation  against,  as  well  as  provide 
information  on  how  well  this  transformation  performs  on  a different  set 
of  binary  images  with  an  increased  number  of  classes. 

Consider  then  a finite,  continuous,  binary,  two-dimensional 
image,  I,  on  some  plane,  P.  For  any  point  p in  P,  1 is  assigned  the 
value  4>  or  1.  Introducing  a cartesian  coordinate  system  in  P with 
coordinates  (x,y)  allows  I to  be  written  as  a function  of  x and  y: 


f(x,y)  « 0,1  for  any  real  x,y 


< X < ® 

— CO  < y < m 


(2-2) 


The  plane  is  infinite  in  extent,  but  we  will  require  I to  be  finite, 
that  la,  for  the  image  area  defined  as: 


A - / / t(x,y)  dx  dy 


(2-3) 


(which  is  simply  the  number  of  one  cells  for  a binary  image) , we  require 
A to  be  finite;  A < «*. 

The  image  may  be  digitized  with  a two-dimensional  sampling  function: 
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f(my,nY)  • f(x,y)  6(x-my)6(y-nY) 


(2-4) 


for  m and  n integers,  -~  5.  « 1.  * 

“"  5.  n 1. 00 

6(x)  is  the  impulse  function,  defined  as 


-°»  < x < ® 


(2-5) 


6(x)  *1  , x * 0 

6(x)  ■ 0 , otherwise 

and  y is  the  sampling  interval  (a  real  constant) . The  effect  of  digi- 
tization is  simply  to  add  a quantization  noise,  n*  to  I and  any  function 
of  I: 

i(xY,yY)  ■ *([xY],[yY])  - f(xY,yy)  (2-6) 


where  [x]  is  the  greatest  integer  function. 
The  mean  error  is  then  given  by: 


e - / / n(x,y)  dx,  dy  . 


(2-7) 


The  quantization  noise  is  usually  reduced  to  a tolerable  value  simply 
by  setting  y much  smaller  than  the  size  of  any  portion  of  I that  is  of 
interest.  Rosenthal  has  shown  143]  that  any  function  of  a digitized 
image  may  be  expressed  as  a continuous  function  of  the  continuous  image 
to  within  some  quantization  error.  In  the  following  discussion  this 
approach  will  be  taken  since  it  results  in  a somewhat  simpler  and,  in 
the  author's  opinion,  more  elegant  formulation.  In  the  pattern  recog- 
nition work  described  in  this  thesis,  the  input  system  noise  and  the 
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image  variations  within  a given  class  are  much  larger  than  the  quan- 
tization error,  no  no  analysis  of  the  quantization  error  was  performed. 
Consider  the  two-dimensional  auto  correlation  of  I: 


g(u,v)  - J / f(x,y)  f(x+ti,  y+v)  dx  dy 


(2-8) 


for  u,v  real.  -°°  5.  u f_  “»  “*5.vl. 

Using  the  identities 


u • SK  o cos  0 
v ■ /a  a sin  0 


for  0,ot  real,  0 £ 0 <_  x 
0 < a < “ 


(2-9) 

(2-10) 


the  autocorrelation  function  of  I may  be  expressed  in  a size  normalized 
polar  coordinate  form: 


g(«,e>  - lifZl  . 


(2-11) 


-1  1/2  , X 

The  factors  A and  A are  used  to  obtain  size  invariance  of  g(a,0). 
The  autocorrelation  function  is  even  in  u and  v,  which  translates  into 
periodicity  in  the  polar  coordinates  with  period  x.  This  may  be  demon- 
strated by  setting: 


a'  ■ a 
O'  - 0 + x 


(2-12) 

(2-13) 


It  y 


A'  • / / f(x,v)  dx  dy  * A 


(2-14) 
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u'  ■ i/s-'  a'  cos  0'  ■ /K  a cos(Q-Hr)  » -/A  a cos  0 « -u  (2-15) 


v'  « /A*  a'  cos  0"  /A  a sin(O-hr)  ■ -/\  a sin  0 ■ -v  (2-16) 


g(u’fv')  - / / f (x,y)  f(x+u',  y+v’)  dx  dy 


- / / f (x,y)  f(x-u,  y-v)  dx  dy 


Now  using  the  substitution. 


X ■ x - u 


y'  - y - u 


dx*  * dx,  dy*  • dy 


we  have 


gOi'.v')  - / / f(x'+u,  v’+v)  f(x,,y,)dx'  dy'  - g(u,v) 


therefore 


g(a'.O’)  - StS'jll  . - g(o,0) 


(2-J.7) 


(2-18) 

(2-19) 


(2-20) 


(2-21) 


and  thus  g(o,0)  is  periodic  in  O with  period  it. 

The  autocorrelation  function  is  also  invariant  under  translation. 


since  with 


x'  ■ x + a 


(2-22) 


•4*.  . — fl 
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y'  “ y + h for  a,b  arbitrary  real  constants  -»<a<j»,  -o»<b<».  (2-23) 


A*  ■ / / f(x+a,  y+b)  dx  dy 


g'(u,v)  « / / f(x+a,  y+b)  f(x+u+a,  y+v+b)  dx  dy  . 


Changing  the  variables  of  integration  to  x*  and  y* 


dx'  - dx  dy*  ■ dy 


A'  ■ / / f(x',y')  dx'  dy'  ■ A 


(2-24) 


(2-25) 


(2-26) 


g'(u,v)  - J / f(x',y')  f(x'+u,  y'+v)  dx'  dy'  ■ g(u,v)  (2-27) 


and  then  also 


g'(a,0)  - g(a,0)  . 


(2-28) 


Tha  size  normalized  polar  form  of  the  autocorrelation  function 
is  invariant  under  image  size  change.  To  show  this,  let, 

x'  « ax  for  a an  arbitrary  real  constant  (2-29) 

y*  - ay  (the  magnification)  0 < a < ® (2-30) 


A'  - / / f (ax,  ay)  dx  dy  (2-31) 

cr»  oo 

g'(u',v')  - / / f(ax,ay)  f [a(x+u)a(;*+v) ] dx  dy  (2-32) 
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Changing  the  variables  of  integration  to  x?  and  y' 


g'Cu'.v')  » / / f(x',y')  f(x'+au,  y’+av)  a dx'dy’ 
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dx1  - adx  dv1  ■ ady 


» OO  - 

i*  » / / f(x',y’)  a ax'dy’  - a 


(2-33) 


■ a g(au,  av) 


(2-34) 


and 


u'  « i/a*  a cos  9 m /a^A  a cos  0 ■ a/A  a cos  9 ■ 


au 


v'  «■  /k'  a sin  0 ■ 


av 


(2-35) 

(2-36) 


so 


A'  a^A 


(2-37) 


The  size  normalized  polar  form  of  the  autocorrelation  function 
is  not  invariant  under  image  rotation,  but  it  does  change  by  only  a 
phase  factor  equal  to  the  image  rotation  angle.  This  can  be  seen  using 
the  identities: 

x'  ■ x cos  $ - y sin  <t>  for  rotation  angle  $ (2—33) 

y*  * x sin  $ + y cos  4>  0 <_  $ <_  2m  (2-39) 

oo  » 

A'  ■ J / f(x  cos  <p  - ysin  <j>,  xsin  <p  + ycos  ip)  dx  dy 

— OO  — oo 


(2-40) 
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g’CuSv’)  - / / f(xcos>  -ysin  xsin  $ + ycoa  $) 


•f{(x+u)cos  <}>-  (y4v)sin  (x+u)8in<i>  +(y4ti)cos<{i]dx  dy 

(2-41) 

Changing  the  variables  of  integration  to  x'  and  y ' 


J » 


3x' 

3x' 

— a 

3x 

cos  4> 

’ 3y  “ 

3x 

sin  $ 

3y* 

♦ 9y“  " C°3 

cos<£  , 

-sin* 

2 2 

* cos  $ + sin  4>  * 1 

sin^  , 

cos$ 

A'  - / / f(x',y’)  dx*  dy'  - A (2-42) 

—03  —CO 

00  CO 

g'Cu^v')  - / / f (x' ,y’)f [x'+(ucos$-vsin$) r y'+(u8in<f>+vcos$) jdx’dy* 


■ g(ucos4>  -vsin$  , U8in$  - vcos  $)  (2-43) 

and  if  we  let 

a*  - a 0’  - 0 + ♦ (2-44) 

u1  ■ /K  a cos(0+$)  ■ /A  a (cos0cos$-sin0sin$) 

• ucos<£  - vsin  t (2-45) 

v'  ■ /K  a sin(0+4>)  ■ /A  a (sin08in$+  cos0cos$) 

■ usln$  + vcob  ft  (2-46) 
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The  function  g(a,0)  may  be  used  to  transform  a biraiv  image  into  | 

*1 

I * 

i 

( 

an  MN-dimensional  vector  by  setting: 

a ■ aR  with  am  a real  constant  (2-48)  ; 

ia,n,M,N  integers  f 

I 

0 - — 1?~.U  1 < m < M (2-49)  | 

N 1 < n < N | 

1 ■ 

*?■ 

*?■ 

For  lack  of  a more  inspired  nrme  this  transformation  is  referred  to  as  -1 

'^1 

the  circular  autocorrelation  function.  The  MN  terms  of  this  function 

| 

s 

are,  as  shown  above,  independent  of  translation  and  size  changes  of  the  j 

1 

input  image,  I (within,  of  course,  quantization  error  for  a discrete 

1 

computation).  Under  an  image  rotation  of  ir/N  radians,  the  terms  are  | 

I : 

related  by  1 

1 

m 

g’(m,l)  - R(m,N)  (2-50)  i 

| 

1 : 

g*(m,n)  - g(m ,n-l)  , n > 1 (2-51)  l 

4 

I 

and  similarly  for  other  increments  of  vf N.  It  can  be  seen  that  for  f 

1 

rotation  angles  other  chan  irr./N  there  will  be  another  quantization 

1 

error  introduced,  and  thus  N must  be  chosen  large  enough  to  make  this  | 

/3«B 

§!■' 

error  negligible  for  the  particular  application.  This  relation  between  | 

t l 

image  rotation  and  the  circular  autocorrelation  term  sequence  allows  the  " 

1 

image  angle  to  be  determined  to  within  m/N  radians  (+  a m radian  i 

■ 

ambiguity  because  of  g(a,0) *9  periodicity  in  0 of  u).  The  penalty  for  :i 

M 

> 

this  feature  is  an  Increased  classification  time  over  a rotation  in- 

: 

£x 
1%  • 

variant  transformation  because  of  the  S searches  of  tne  authority  files 
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required.  The  ability  to  determine  Image  rotation  angle,  however,  may 


be  worth  the  time  trade-off  In  many  applications.  Practically,  the 


circular  autocorrelation  function  is  computed  simply  by  duplicating 
the  input  image,  shifting  the  duplicated  image  a distance  /K  a at 


angle  0 and  counting  the  number  of  intersecting  one  cells. 


The  moment  transformation  is  based  upon  a set  of  two-dimensional 


moment  functions  derived  by  Hu  [35],  which  are  Invariant  with  respect  to 


linage  translation,  sire,  and  rotation.  The  two-dimensional  moments  are 


defined  by: 


00  00 

mpq  * J I l f(Y»*Y»)  ®P«q 

n nr*-«>  n“-o° 


(2-52) 


M “ I l f (ym.yn)  p,q  - 0,1,2,, 


m«-«  n«-»» 


(2-53) 


where:  m is  the  (p+q)  order  moment 

PH 


f(ym,yn)  is  the  digitized  image. 


The  centroid  of  the  image  is  then  given  by: 


m m m m 

10 


M l l f(ym,Yn>m 


(2-54) 


CO  CO 

n " “oi  " M ^ X f(YW»Yt)i 


(2-55) 


m«-<» 


where  (m,n)  are  the  coordinates  of  the  centroid  of  f(ym,yn). 


The  central  moments  are  defined  by: 


1 • * __  _ 

WPq  " M ^ E f(va,Yn)(ra-m)P(n-n)q  . 


(2-56) 


n»-<* 


The  central  moments  may  also  be  expressed  as  sums  of  the  ordinary 
moments  [5];  the  expressions  for  the  first  three  orders  are: 


"00  • "oo  ■ 1 (2'57) 

*01  ■ *10  - 0 <2'58) 

u20  ‘ "20  ‘ <V  (2'59) 

“02  * "02  ‘ <m0l)?  (2'*0) 

"30  * ”30  ‘ 3Vl0  + 2("l0)  <2'61) 

2 

“21  ' "21  ' W ‘ 2”ll”l0  + 2<m10)  "01  <2‘62) 

2 

v12  - ”12  - %2m10  ~ 2mllm0l  + 2("01)  "10  (2_63> 

3 

“C3  ‘ "03  ‘ 3"02"01  + 2("01>  (2"M) 

The  moment  invariants  used  in  this  research  are: 

M2-;4  '<“20  ' “02)2  + “"i!1  <2-65’ 

*3  " 7 '<"30  - 3“l2)2  + (3"?1  - "03>2J  <2-66> 

M4  * ^6  '<“30  + "127  + <“21  + “03>  1 <2'67) 


r 
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M5  “ ?12  ^W30  “ 3yi2^P30  + P12^V30  + P12^  + 3^p21  + *W  * 

+ (3u2i  - U03Xnn  + U03H3(ll3n  + U21^  • (W21  + V D (2"68) 


M6  - ;8  {(P20  - V02)[(V30  + wi2>  “ (,'21  + W 1 


+ AVP30  + P12)<P21  + W} 


(2-69) 


. 2 


"?  * 02  {(3W21  ‘ P03)(P30  + yl2)[(w30  + V " KW21  + W 1 


(W30  ” 3wi2^W21  + w03)I3(ll30  + U12*  ' (p21  + *W  ^ (2~70) 


1/2 

r “ (M20  + V • 


(2-71) 


It  can  be  shown  [5,35]  that  functions  - My  are  invariant  under  Image 
translation,  size  change,  and  rotation.  By  computing  on  both 

the  imagf  silhouette  and  boundary  a twelve-dimensional  feature  vector 
may  be  obtained  [5]. 


2.6  Classification  Algorithms 

Two  classification  algorithms  were  investigated  in  this  work,  the 
nearest  neighbor  rule  and  the  distance-weighted  k-nearest  neighbor  rule. 
The  nearest  neighbor  classifier  computes  the  Euclidian  distance  between 
the  unknown  vector  and  every  vector  in  the  authority  files.  The  unknown 


vector  is  assigned  to  the  class  which  contains  the  vector  closest  to  the 
unknown.  This  nay  be  written  as: 


2 1/2 


« I I te  - O 1 


(2-72) 


where:  r is  the  k-th  element  of  the  unknown  feature  vector,  R, 

K of  dimension  n. 

pq 

s is  the  k-th  element  of  the  p-th  vector  S in  the  authority 
file  for  class  q. 

e is  the  'error*  (Euclidian  distance)  between  points  R and  S. 
Pd 


Then  for  Q authority  files  (and  hence  Q classes)  each  containing  p 
vectors  (depth  P),  there  is  some  smallest  e 


e e tor  all  p,q  1 < p < P 
mn  pq 

1 iq  iQ 


(2-73) 


and  R is  assigned  to  class  n. 

The  nearest  neighbor  rule  is  a simple  piecewise  linear  classifier; 
that  is,  the  class  boundaries,  or  decision  surfaces  are  defined  by  seg- 
ments of  hyperplanes.  For  example,  in  a two  class  problem  with  two  author- 
ity vectors  per  class,  in  two  dimensions,  the  decision  surface  defined 
by  the  nearest  neighbor  rule  is  shown  in  Figure  2.2.  An  unknown  vector 
in  the  space  t the  left  of  the  decision  surface  would  be  assigned  to 
class  1 by  the  nearest  neighbor  rule,  while  an  unknown  vector  in  the 
space  to  the  right  of  the  decision  surface  would  be  assigned  to  class  2. 
Piecewise  linear  techniques  such  as  the  nearest  neighbor  rule  can  be 
combined  wi;h  logical  rules  to  construct  quite  complex  boundaries  [44], 


but  only  the  simple  nearest  neighbor  rule  was  used  in  this  research. 
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The  distance-weighted  k-nearest  neighbor  classifier  assigns 
an  unknown  vector,  R,  to  the  class  most  heavily  represented  among  its 
k nearest  neighbors  in  the  authority  flies.  The  'representation*  is 
computed  as  the  sum  of  weights  assigned  to  the  k vectors.  The  weights, 
in  turn,  depend  Inversely  upon  the  distance  between  the  unknown  and 
authority  vectors.  Using  the  notation  defined  above  we  may  write: 

v,  - f(e  ) (2-74) 

Pq  pq 

where  w is  the  weight  assigned  to  the  p-th  vector  in  the 
authority  file  of  class  q. 

The  actual  weight  function  depends  upon  the  type  of  feature  vector  used, 

and  will  in  turn  affect  the  decision  surface  shape.  f(e  ) is  usually 

pq 

defined  such  that: 


lim  w ■ lim  f(e  ) ■ 1 

e -0  ™ a -0  ™ 

pq  pq 


(2-75) 


lim  w ■ lim  f(e  ) ■ 0 

e-  M e ~ M 


(2-76) 


pq 


pq 


The  class  weights,  W^,  are  then  determined  by: 

P 

W " I w 

q p-i  pq 


(2-77) 


v is  computed  only  for  the  k nearest  neighbors  of  the  unknown  vector 

pq 

R,  with  all  other  w set  to  zero,  k may  be  a fixed  number,  in  which 

pq 
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case  the  lowest  k ep^  are  selected  to  compute  w , or  k may  be  variable 
and  depend  on  the  number  of  error  terms  below  some  limit: 

w ■ 0 if  e >e  ;l<p<P;l<q<Q.  (2-78) 

pq  pq  max  — — — - 

In  any  case,  there  will  exist  a largest 

W > W for  all  q . 1 < q < Q (2-79) 

n - q _ - 

and  R is  assigned  to  class  n.  The  decision  surface  defined  by  the 
distance-weighted  k-nearest  neighbor  rule  is  decidedly  non-linear. 
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CHAPTER  III 


FACIAL  RECOGNITION  AND  AUTOMATIC  TRAINING 


3.1  Facial  Profile  Recognition 


Once  a training  algorithm  to  select  the  vectors  for  the 
authority  file*  has  been  devised  and  verified  and  the  various  system 
parameters  optimized,  the  pattern  recognition  system  is  complete  and 
the  testing  of  its  ability  to  recognize  facial  profiles  may  begin. 

Five  experiments  were  conducted.  The  procedure  in  each  was  to  divide 
the  120  facial  profiles  into  two  distinct  sets,  a training  set  and  a 
test  set.  The  activity  sort  training  rule  (to  be  described  later  in 
this  chapter)  was  used  to  select  vectors  from  the  training  set  for  the 
authority  files.  The  recognition  accuracy  of  the  system  was  then 
tested  with  both  the  training  and  test  sets.  The  recognition  accuracy 
obtained  on  the  test  set  is,  of  course,  the  only  valid  measure  of  the 
system's  performance,  since  the  test  set  consists  solely  of  images 
that  are  unknown  to  the  computer.  The  recognition  accuracy  on  the 
training  set,  some  of  whose  image  feature  vectors  will  be  stored  in 
the  authority  files,  was  included  simply  as  a benchmark.  In  the 
remainder  of  this  thesis  whenever  recognition  accuracy  is  mentioned, 
the  recognition  accuracy  on  the  independent  test  set  is  meant 
unless  otherwise  specified. 


-..•»  -*y  liSy-Otid  -tg. 


38 


In  experiments  1-4  training  with  the  activity  sort  rule 
was  terminated  after  250  random  selections  from  the  training  set  had 
been  made.  In  experiment  5,  500  selections  were  used  because  of  the 
larger  training  set  size.  This  was  deemed  a sufficient  number  cf 
Inputs  to  achieve  maximum  recognition  accuracy  based  on  the  training 
times  obtained  in  the  tests  of  the  activity  sort  training  rule  de- 
scribed in  Section  3.4. 

In  experiment  1,  samples  1-6  of  the  facial  profile  images  in 
each  class  were  used  as  the  training  set,  while  samples  7-12  of  the 
facial  profile  images  in  each  class  were  used  as  the  test  set.  In 
experiment  2,  this  was  reversed  with  samples  7-12  used  as  the  training 
set  and  samples  1-6  as  the  test  set.  The  results  of  these  two  experi- 
ments are  shown  in  Tables  3.1  and  3.2,  respectively.  Note  that  the 
recognition  accuracy  on  the  test  set  is  considerably  higher  in  the 
first  experiment.  This  is  most  likely  due  to  the  variations  intro- 
duced in  samples  1-6  of  the  facial  profile  images  by  changes  in 
lighting,  camera  aperture,  and  subject  position  as  described  in  Section 
2.3.  When  used  as  a training  set  these  images  seem  to  provide  author- 
ity files  that  better  define  the  region  in  the  feature  vector  space 
in  which  each  class  falls  (i.e.,  a better  decision  surface)  than  do 
samples  7-12,  where  variations  from  image  to  image  were  held  as  small 
as  possible. 


Authority 

Recognition  Accuracy 

File 

Training  set 

Test  set 

Depth 

No.  Correct 

7,  Correct 

No.  Correct 

% Correct 

1 

34 

56.7 

27 

45.0 

2 

48 

80.0 

34 

56.7 

3 

57 

95.0 

46 

76.7 

4 

59 

98.3 

43 

71.7 

5 

60 

100 

41 

68.3 

6 

60 

100 

41 

68.3 

Training  Set  -•  Samples  1, 2,3,4, 5, 6 
Test  Set  - Samples  7,8,9,10,11,12 

Table  3.1  Facial  Profile  Recognition  Experiment  1 Results 


Authority 

Recognition  Accuracy 

File 

1 Training 

set 

Test  sei 

t 

Depth 

. ! 

No.  Correct 

7.  Correct 

No.  Correct 

7 Cor 

i 

53 

88.3 

26 

43.3 

2 

59 

98.3 

32 

53.3 

3 

60 

100 

26 

43.3 

4 

60 

100 

26 

43.3 

5 

60 

100 

26 

43.3 

6 

60 

100 

26 

43.3 

Training  Set  - Samples  7,8,9,10,11,12 
Test  Set  - Samples  1,2, 3, 4, 5, 6 

Table  3.2  Facial  Profile  Recognition  Experiment  2 Results 


Experiments  3 and  A (Tables  3.3  and  3. A,  respectively)  used 


random  selections  of  6 images  per  class  in  the  training  set  and  the 
remaining  images  in  the  test  set.  Maximum  recognition  accuracies  on 
the  test  set  were  better  than  those  obtained  in  experiment  1.  Experi- 
ment 5 used  a random  selection  of  9 images  per  class  in  the  training 
set  and  the  remaining  images  in  the  test  set.  The  results,  in  Table 
3.5,  show  the  highest  recognition  accuracy  achieved  on  any  test  set. 

This  seems  to  indicate  that  if  the  activity  sort  rule  is  presented 
with  a large  training  set,  it  can  determire  the  decision  surfaces 
(authority  file  vectors)  to  provide  better  recognition  accuracy  on 
independent  data  than  if  it  is  presented  with  a smaller  training  set. 
"Seems"  is  used  in  the  preceding  sentence  because  the  small  size  of 
the  test  set  makes  generalizing  the  results  very  risky,  but  the 
preceding  statement  does  agree  with  the  expected  outcome  from  proba- 
bility theory.  As  more  samples  of  the  classes  are  obtained,  the  sample 
distribution  becomes  better  defined  and  the  decision  surfaces  can 
therefore  be  adjusted  (i.e.,  authority  file  vectors  picked)  to  provide 
better  separation  between  classes  and  hence  better  recognition  accuracy. 

Notice  that  in  all  cases  the  maximum  recognition  accuracy  on 
the  test  set  does  not  occur  at  maximum  authority  file  depth,  as  might 
be  expected.  Plots  of  recognition  accuracy  vs.  authority  file  depth 
are  given  in  Figure  3.1.  The  maximum  recognition  accuracy  seems  to 
occur  at  about  one-half  the  maximum  depth.  The  reason  for  this  be- 
havior is  not  clear — the  umall  sample  size  makes  any  generalization 
difficult. 


Authority 


Recognition  Accuracy 


File 

Depth 

Training  Set 

Test  Set  ! 

No.  Correct 

Z Correct 

No.  Correct 

Z Correct 

1 

29 

48.3 

21 

35.0 

2 

51 

85.0 

48 

80.0 

3 

54 

90.0 

46 

76.7 

4 

59 

98.3 

43 

71.7 

5 

58 

96.7 

44 

73.3 

6 

60 

100 

46 

76.7 

Training  Set  - Random  selection  1,6  samples  per  class 
Test  Set  - Remaining  60  images,  6 per  class 

Table  3.3  Facial  Profile  Recognition  Experiment  3 Results 


Authority 

File 

Depth 

Recognition  Accuracy 

Training 

Set 

Test  Set 

No.  Correct 

— 

Z Correct 

No.  Correct 

Z Correct 

1 

34 

56.7 

28 

46.7 

2 

53 

88.3 

38 

63.3 

3 

57 

95.0 

43 

71.7 

4 

59 

98.3 

47 

78.3 

5 

60 

100 

45 

75.0 

6 

60 

100 

45 

75.0 

Training  Set  - Random  selection  2,  6 samples  per  class 
Test  Set  - Remaining  60  images,  6 per  class 

Table  3.4  Facial  Profile  Recognition  Experiment  4 Keaults 
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Authority 

File 

Depth 

Recognition  Accuracy 

Traininf 

^ Set  

Test 

Set 

No.  Correct 

% Correct 

No.  Correct 

Z Correct 

1 

62 

68.9 

21 

70.0 

2 

73 

81.1 

22 

73.3 

3 

84 

93.3 

25 

83.3 

4 

83 

92.2 

27 

90.0 

5 

89 

98.9 

23 

76.7 

6 

88 

97.8 

22 

73.3 

7 

88 

97.8 

25 

83.3 

8 

90 

100 

23 

76.7 

9 

90 

100 

26 

86.7 

Training  Set  - Random  selection  2P  9 samples  per  class 
Test  Set  - Remaining  30  images,  3 per  class 


Table  3.5  Facial  Profile  Recognition  Experiment  5 Results 
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It  was  initially  felt  that  the  results  discussed  above  did 
not  have  sufficiently  high  recognition  accuracy,  and  because  of  the 
excellent  recognition  accuracy  obtained  by  Dudani  with  a moment  in- 
variants feature  extractor,  it  was  decided  to  replace  the  circular 
autocorrelation  function  with  moment  invariant  functions.  The  six 
moment  invariant  functions  given  in  Section  2.5  were  used  twice,  once 
on  the  image  boundary  and  once  on  the  silhouette,  to  obtain  a 
12-dimensional  feature  vector.  The  classification  algorithm  was 
modified  to  use  the  weight  function  and  the  feature  vector  normaliza- 
tion used  by  Dudani’s  aircraft  recognition  classifier.  The  last 
three  facial  profile  recognition  experiments  were  then  repeated  with 
no  other  changes-  Typical  feature  vectors  are  given  in  Table  3.6. 

The  results  are  given  in  Tables  3.7  - 3.9.  The  low  recogni- 
tion accuracies  on  the  test  sets  were  totally  unexpected.  Recognition 
accuracies  were,  in  all  but  one  case,  lower  than  those  obtained  using 
circular  autocorrelation.  There  mav  be  two  reasons  for  this  result. 
The  class-to-class  shape  variations  are  often  more  subtle  for  facial 
profiles  than  for  aircraft  silhouettes,  and  due  to  the  idiosvncracies 
of  the  television  camera  used  for  input  to  the  computer,  facial  pro- 
files exhibited  more  variation  from  sample  to  sample  within  a given 
class  than  the  aircraft  images.  Under  these  conditions  circular 
autocorrelation  seemed  to  provide  feature  vectors  with  better  separa- 
bility between  classes  than  the  moment  invariants  did. 
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Table  3.6  Typical  Moment  Invariant  Feature  Vectors 


Class 

Feature  Vector 

1 

2 

3 

4 

5 

6 

1 

.556 

.555X10"1 

. 186x10”* 

-.577xl0"3 

.984xl0“2 

-.159xl0~3 

2 

.622 

.818x10”* 

•339xl0“3 

.148xl0~5 

-.267xl0“3 

. lOOxlO-5 

3 

.468 

.758 

.220 

.624x10“* 

-.113 

-.647xl0-1 

4 

.672 

.137 

. 701x10“ 2 

-.384xl0“4 

-.171xl0~2 

. 214xl0-3 

5 

.641 

.629x10”* 

.800x10”* 

.151xlO“2 

.638xl0-1 

.547xl0-2 

6 

.618 

.129 

.296x10”* 

-.151xl0“2 

.932xl0“2 

-.103xl0-2 

7 

.519 

.595x10"! 

.133x10”* 

.124xl0~3 

-.194xl0'2 

-.350xl0-3 

8 

.505 

.477 

.190 

.540x10”* 

-,326xlO_1 

-.128xl0_1 

9 

.635 

.132 

.420xl0"3 

-.937xl0“6 

-.125xl0~3 

.299xl0~5 

10 

.548 

.162 

.256x10”* 

•150xl0“2 

,124xl0-1 

•678xl0-3 

Feature  Vector 

Clascj 

7 

8 

9 

10 

11 

12 

1 

.707 

.876xl0_1 

,765xl0"2 

.289xl0~4 

.342x10-2 

-.196xl0-3 

2 

.780 

.102 

,291xl0-1 

.146x10-2 

.128x10-1 

,625xl0-3 

3 

.718 

.401 

.218 

.285xl0_1 

.182 

.582xl0-1 

4 

.762 

.219 

.368X10'1 

.124xl0-2 

-.217xl0-3 

. 306xl0~2 

5 

.730 

.107 

.197xl0_1 

.735xl0~3 

„568xl0-2 

-.542xl0-3 

6 

.772 

.157 

.489xl0-1 

.429x10-2 

.319x10“! 

.128x10-3 

7 

.721 

.422xl0-1 

. 282xl0-2 

-.124xl0-4 

-.180xl0-2 

-.283xl0"4 

8 

.645 

.392 

.675xl0-1 

.949xl0-2 

.480x10"! 

-.242xl0'2 

9 

.766 

.163 

.274xl0_1 

.954xl0-3 

-.988xl0-3 

-.157xlO-2 

10 

.737 

.646xl0_1 

,835xl0-2 

.124X10"4 

-.933xlO~3 

.19 3x1 0"3 

* t 
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Table  3.7  Moment  Invariants  Recognition  Experiment  1 Results 


Authority 

Recognition  Accuracy 

File 

Traini 

ng  Set 

Test  f 

let 

Depth 

No.  Correct 

7,  Correct 

No.  Correct 

IKS 

Correct 

1 

33 

1 - ' 

55.0 

22 

36.7 

2 

AO 

66.7 

23 

38.3 

3 

A7 

78.3 

36 

i 

60.0 

1 

A 

56 

l 

93.3 

29 

A8.3 

5 

I 60 

100 

29 

48.3 

6 

j 60 

100 

32 

53.3 

Training  Set  - Random  selection  1,  6 samples  per  class 
Test  Set  - Remaining  60  images,  6 per  class 


Table  3.8  Moment  Invariants  Recognition  Experiment  2 Results 


Authority 

File 

Depth 


Recognition  Accuracy 


Training  Set 


No.  Correct  % Correct 


Test  Set 


No.  Correct  7.  Correct 


Training  Set  - Random  selection  2,  6 samples  per  class 
Test  Set  - Remaining  60  images,  6 per  class 
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Authority 

File 


Recognition  Accurac 


Training  Set 


Test  Set 


J Depth 

No.  Correct 

% Correct 

No.  Correct 

% Correct 

t 

1 

1 

36 

40. 0 

10 

33.3 

2 

56 

62.2 

14 

46.7 

3 

68 

75.6 

16 

53.3 

A 

7A 

82.2 

15 

50.0 

5 

82 

91.1 

21 

70.0 

6 

87 

96.7 

19 

63.3 

7 

90 

100 

19 

63.3 

8 

90 

100 

17 

56.7 

9 

90 

100 

19 

63.3 

Training  Set  - Random  Selection  2,  9 samples  per  class 
Test  Set  - Remaining  30  Images,  3 per  class 


Table  3.9  Moment  Invariants  Recognition  Experiment  3 Results 
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In  order  to  determine  whether  the  results  of  the  above  experi- 
ments were  acceptable,  it  was  decided  to  compare  the  recognition 
accuracy  of  computer  facial  profile  recognition  against  that  of  humans 
presented  with  the  same  data.  Since  humans  are  generally  regarded  as 
good  pattern  recognizers,  it  was  felt  that  if  the  machine  performed 
comparably  to  a human  the  pattern  recognition  system  would  be  acceptable. 
Accordingly,  the  following  experiment  was  devised.  A facial  profile 
photograph  of  each  of  the  ten  subjects  was  taken.  These  photographs 
became  the  human  recognizer's  "authority  file."  Three  people  were 
chosen  for  this  experiment,  the  first  two  technically  oriented  and 
close  to  this  work,  the  third  non-technically  oriented  and  unfamiliar 
with  this  work.  Each  person  was  given  the  set  of  reference  photographs 
and  seated  in  front  of  the  Tektronix  611  display.  The  facial  profiles 
after  edge  extraction,  as  presented  in  Appendix  B,  were  selected  at 
random  and  displayed  on  the  CRT  one  at  a time.  The  recognizer  was 
given  as  long  as  he  wished  to  make  a classification  by  comparing  the 
CRT  image  against  the  photographs.  All  120  facial  profiles  were  used, 
but  because  of  limited  disk  storage  samples  1-6  (on  all  classes)  were 
presented  first,  and  then  samples  7-12. 

The  recognition  accuracies  of  the  three  human  recognizers  are 
given  in  Table  3.10.  It  can  be  seen  that  the  recognition  accuracy  on 
samples  7-12  is  higher  than  the  recognition  accuracy  on  samples  1-6. 

There  are  two  possible  reasons  for  thi  ; the  recognizers  may  have  had 
some  difficulty  in  correlating  the  CRT  images  to  the  reference  photographs, 
i.e.,  some  early  misclassifications  were  due  to  learning,  and  the  larger 


-is 


sample-to-sample  variation  within  a Riven  class  for  samples  1-6  may  have 
reduced  the  human  recognizers 1 accuracy.  In  any  case,  comparing  these 
results  with  the  computer  results,  it  can  be  seen  that  using  circular 
autocorrelation  the  computer  at  least  matched  the  human  recognition  accu- 
racy in  all  hut  experiment  2,  while  the  results  of  experiment  5 indicate 
that  with  a sufficiently  large  training  set  it  is  possible  for  a computer 
pattern  recognition  system  to  achieve  significantly  higher  recognition 
accuracy  than  a human  on  this  problem.  The  best  recognition  accuracy 
obtained  with  moment  Invariants  (70%)  is  comparable  to  human  recognition 
accuracy  on  the  120  images  (68-71%),  a result  similar  to,  although  not 
quite  as  good  as,  that  obtained  by  Dudanl  on  aircraft  recognition  [5]. 


Recognition  Accuracy 


Samples 

1-6 

Samples  7- 

-12  Samples  1 

-12 

i Correct 

% Correct 

No.  Correct 

7.  Correct  No.  Correct 

% Correct 

39 

65.0 

46 

76.7  85 

70.8 

38 

63.3 

44 

73.3  82 

68.3 

38 

63.3 

46 

76.7  84 

70.0 

Table  3.10  Human  Facial  Profile  Recognition  Accuracy 
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3.2  Automatic  Training 

In  the  following  chapter,  the  selection  and  parameter  optimiza- 
tion of  two  deterministic  feature  vector  classification  schemes  will  be 
discussed,  while  in  this  chapter  the  problem  of  building  the  authority 
file  for  the  classifier  will  be  addressed.  It  is  the  authority  files' 
contents  and  depth  (number  of  vectors)  that  determines  the  decision 
surfaces.  If  the  feature  vectors  are  well  defined,  or  a sufficiently 
large  sample  size  exists,  the  authority  file  vectors  and  depth  for 
optimal  recognition  accuracy  may  be  determined  by  computation  (indeed, 
the  authority  file  concept  is  not  even  necessary;  one  may  write  a 
set  of  equations  describing  the  decision  surfaces).  However,  outside 


i 


of  a theoretical  analysis,  this  is  seldom  the  case.  The  problem  then 
is  how,  with  the  data  available,  are  the  authority  files  to  be  generated? 
The  construction  of  the  authority  file  is  referred  to  as  training. 

Duda  and  Fossom  [44]  have  described  an  algorithm  that  computes  an 
authority  file  vector  for  linearly  separable  data,  i.e.,  the  classes 
in  the  feature  vector  space  may  be  separated  by  hyperplanes.  Feature 
vectors  generated  by  present  feature  extractors,  however,  tend  not  to 
fall  in  such  neat  classes,  and  in  general  a fairly  convolved  decision 
surface  is  required  to  separate  the  classes.  Such  a convolved  decision 
surface  may  be  obtained  by  using  authority  file  depths  greater  than 
one.  The  authority  files  may  be  filled  with  vectors  selected  from 
some  training  set,  i.e.,  feature  vectors  generated  from  a typical  set 
of  images  to  be  recognized.  The  selection  criterion  is  to  obtain  the 
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set  of  vectors  that  give  the  greatest  recognition  accuracy.  The  feature  iJ 

vectors  selected  to  fill  the  authority  files  then  determine  the  decision 
surfaces . 

For  a practical  pattern  recognition  system  operating  in  a chang- 
ing environment,  some  sort  of  adaptive  decision  surface  is  desirable. 

This  suggests  a classification  routine  that  is  able  to  adjust  its 
authority  files  to  obtain  a more  nearly  optimum  set  of  decision  surfaces. 

Such  a routine  should  also  be  able  to  add  new  classes  if  necessary. 

Each  classification  attempt  then,  by  such  a pattern  recognition  system, 
would  have  the  potential  of  changing  the  authority  files  (and  hence 
decision  surfaces)  to  adjust  to  new  data.  Two  algorithms  to  provide 
this  automatic  training  are  discussed  in  the  next  section. 


3.3  Two  Training  Algorithms 

Consider  a classifier  of  the  nearest  neighbor  or  distance 
weighted  k-nearest  neighbor  type  with  authority  files  of  depth  n. 

A random  feature  vector  is  applied  to  the  classifier  and  a classifi- 
cation made.  If  the  classification  is  correct,  the  closest  (Euclidian 
distance)  vector  in  the  authority  files  to  the  unknown  vector  that  is 
also  in  the  same  class  as  the  unknown  vector  is  moved  to  the  top  of 
its  authority  file.  If  the  classification  is  not  correct,  the  unknown 
vector  is  'pushed'  on  top  of  the  appropriate  authority  file.  All  other 
vectors  in  this  file  are  pushed  down,  and  if  the  file  is  full,  the 
bottom  vector  will  be  lost. 
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Under  these  rules  the  most  ’Important*  vectors,  i.e.,  those 


used  most  often  In  Identification  and  that  define  ’critical’  areas 


of  the  decision  surfaces,  will  ’bubble’  to  the  top  of  the  authority 


files,  while  the  vectors  seldom  used  in  identification  will  ’sink’  to 


the  bottom  of  the  file.  When  a vector  is  incorrectly  classified,  its 


insertion  into  the  authority  file  redefines  the  decision  surface  so 


that  the  misclassif ication  will  not  occur  again.  If  the  authority  file 


is  full,  it  is  the  least  used  vector  that  will  be  lost  when  a new 


vector  is  added  because  of  the  bubble  action  described  above. 


A variation  on  this  technique  is  to  associate  a counter  with 


each  vector  in  the  authority  files.  Again,  a random  feature  vector  is 


applied  to  the  classifier  and  a classification  made.  If  the  classifi- 


cation is  correct,  the  closest  feature  vector  in  the  authority  files 


to  the  unknown  vector  that  is  also  in  the  same  class  as  the  unknown 


vector  has  its  counter  incremented.  If  the  classification  is  not 


correct,  the  vector  in  the  appropriate  authority  file  with  the  lowest 


number  in  its  counter  is  replaced  by  the  unknown  vector  and  the  counter 


is  reset. 


The  effect  of  this  rule  is  similar  to  that  of  the  bubble  sort 


rule  described  above.  Authority  file  vectors  with  the  highest  ’activity,’ 


i.e.,  those  used  most  often  in  classification,  are  retained,  while 


seldom  used  vectors  are  removed  so  that  a new,  and  nerhaps  more  impor- 


tant vector  mav  be  added.  These  two  authority  file  sorting  rules  do 


require  a ’teacher,’  since  the  classification  cf  the  ’unknown’  vector 
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must  he  known  to  he  correct  or  not,  and  if  incorrect  the  class  of  the 
'unknown'  vector  must  he  know  so  that  the  correct  authority  file  may 
be  modified.  These  requirements  are  not  restrictive  if  these  rules 
are  used  when  the  authority  files  are  first  filled  from  a training  set, 
but  if  the  authority  files  are  to  he  modified  when  the  pattern  recog- 
nition system  is  operating,  some  sort  of  feedback  as  to  the  correctness 
of  the  classification  must  be  provided.  The  first  application,  that 
of  initial  training,  is  investigated  in  the  following  sections,  while 
the  second  application  of  these  sorting  rules,  that  of  decision  sur- 
face modification  during  operation,  is  left  for  future  research. 


3.4  An  Experimental  Comparison 

A simple  experiment  was  performed  to  verify  the  expected 
performance  of  the  two  authority  file  sorting  rules.  The  circular 
autocorrelation  feature  extractor  and  the  dit  ince-weighted  k-nearest 
neighbor  classifier  were  used.  Authority  file  depth  was  set  to  three. 
The  training  and  testing  set  was  samples  1-6  of  the  facial  profile 
images  for  each  of  the  ten  classes.  In  the  first  test,  the  authority 
files'  contents  were  adjusted  manually  by  cut-and-try  method  to  obtain 
the  best  possible  recognition  accuracy.  Next  the  bubble  sort  tech- 
nique was  run  for  500  random  selections  from  the  training  set  and  then 
tested.  Then  the  activity  sort  rule  was  run  for  500  random  selections 
and  tested.  The  results  are  shown  in  Table  3.11.  The  samples  to  train 
column  indicates  the  point  (in  number  of  random  inputs)  after  which  the 
authority  files  began  to  'oscillate.'  Since  some  authority  files  were 
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not  deep  enough  to  hold 
accuracy  on  that  class, 
oscillate  In  and  out  of 
In  recognition  accuracy 


i 

all  the  vectors  necessary  for  100%  recognition 

i j 
\ . 

the  lower  importance  vectors  would  swap  or  l ' 

< * 

the  files.  At  this  point,  ro  further  increase 

could  be  obtained.  ; 

i 
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Test 

No. 

Sort 

Type 

Authority 

File 

De~>th 

Samples 

to 

Train 

Recognition  Accuracy 

No.  Correct 

% Correct 

No.  classes 
100%  correct 

1 

manual 

3 

— 

49 

81.7 

7 

2 

bubble 

3 

Ill 

48 

80.0 

6 

3 

activity 

3 

111 

52 

86.7 

6 

4 

activity 

4 

179 

59 

93.3 

9 

5 

activity 

2 

181 

43 

71.7 

3 

Table  3.11  Sort  Algorithm  Recognition  Accuracies 

i 

% 


From  Table  3.11  it  can  be  seen  that  the  manual  and  bubble  sort 
produced  about  the  same  recognition  accuracy,  while  that  of  the  acti- 
vity sort  was  slightly  higher.  One  reason  that  the  activity  sort  per- 
formed better  than  the  bubble  sort  may  be  because  of  the  inherent 
integration  provided  by  the  activity  sort  counter.  Consider,  for 
example,  an  authority  file  which  contains  a vector  that  is  often  used 
(i.e.,  is  the  closest  to  the  input  vector)  in  classification.  Suppose 
now  an  input  sequence  occurs  in  which  the  other  vectors  in  the  authority 
file  are  used  and  hence  bubble  to  the  top  of  the  file,  or  several 
misclassif led  vectors  are  inserted  into  the  file.  If  this  occurs  the 
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important  (often  used)  vector  may  end  up  at  the  bottom  of  the  authority 
file  and  be  pushed  out  by  the  next  vector  inserted.  Since  the  input 
sequence  is  random ; such  a sequence  is  most  likely  to  occur  for  snail 
file  depths.  With  the  activity  sort  an  often  used  vector  will  build 
up  a count  significantly  larger  than  less  important  vectors,  and  an 
input  sequence  that  did  not  use  this  vector  would  have  to  be  long 
enough  to  build  up  the  counts  of  the  other  feature  vectors  to  values 
greater  than  that  of  the  'important*  vector  before  it  would  be  replaced. 
Sir.ce  a longer  input  sequence  of  this  type  is  required,  it  is  less 
likely  to  occur  and  the  activity  sort  files  remain  more  stable. 

The  automatic  training  rules  were  not  able  to  match  the  manual 
selection  of  feature  vectors  for  the  number  of  classes  100%  correct, 
which  is  not  surprising  since  the  automatic  routines  were  designed  to 
optimize  the  total  recognition  accuracy  without  regard  to  class. 

Table  3.12  show9  the  recognition  accuracy  of  the  three  methods  for  each 
class.  Included  in  Tables  3.11  and  3.12  are  two  training  attempts  with 
the  activity  sort  rule  for  authority  file  depths  of  2 and  4.  The 
recognition  accuracy  behaved  as  expected,  lower  for  a depth  of  2 and 
higher  for  a depth  of  4.  Notice  that  the  number  of  innuts  required  to 


train  the  authority  files  increases  for  depths  of  2 and  4.  For  a file 
depth  of  2 or  4 and  a random  input  selection  with  a uniform  probabi- 
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Test 

Recognition  Accuracy, 

X Correct/Claa8 

no* 

1 

2 

*% 

J 

4 

5 

6 

7 

8 

9 

10 

i 

50 

100 

100 

100 

100 

50 

1.7 

100 

100 

100 

2 

50 

100 

100 

83 

100 

67 

0 

100 

100 

100 

3 

50 

100 

100 

83 

100 

50 

83 

100 

100 

100 

4 

83 

100 

100 

100 

100 

100 

100 

100 

100 

100 

5 

33 

83 

100 

83 

67 

50 

50 

100 

50 

100 

Table  3.12.  Sort  Algorithm  Recognition  Accuracies  per  Class 


The  activity  sort  rule  han  a feature  which  may  be  useful  if  it 
is  to  be  used  to  maintain  the  authority  files  in  an  operating  pattern 
recognition  system.  It  can  be  seen  that  under  such  conditions  the 
counts  associated  with  some  vectors  mav  become  very  large.  The  activity 
sort  rule  may  be  modified  to  divide  the  contents  of  all  counters  for  a 
given  authority  file  by  a constant  when  any  count  in  that  file  exceeds 
a preset  value.  This  technique  tends  to  give  the  authority  files  a 
certain  'forgetfulness.'  Any  authority  file  vector  that  is  not  the 
closest  vector  to  the  unknown  (i.e.,  that  does  not  have  its  counter 
incremented)  a certain  number  of  times  for  some  number  of  classifica- 
tions (depending  cn  the  values  of  the  constants)  will  hav'-  its  count 
reduced  to  zero  by  the  divisions  and  will  be  erased  when  a new  vector 


is  entered.  Because  of  its  high  recognition  accuracy  and  the  potential 
for  implementing  this  limited  memory  timt.  feature,  the  activity  sort 
rule  was  chosen  to  fill  the  authority  files  in  the  remaining  work. 
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3.5  A Large  Sample  Size  Experiment 

Although  the  activity  sort  automatic  training  algorithm  seemed 
to  perform  well  in  the  experiments  described  in  the  preceding  section, 
there  were  two  malor  flaws  in  these  experiments.  The  Input  sample  -$ize 
was  really  too  small  to  determine  conclusively  how  well  the  training 
algorithms  worked,  and  the  test  samples  were  the  same  as  the  training 
samples  rather  than  independent  data.  It  was  decided  to  undertake  an 
experiment  using  the  data  and  pattern  recognition  routines  of  Dudani 
[5],  with  the  activity  sort  rule  for  authority  file  construction,  to 
obtain  a more  meaningful  measure  of  the  algorithm’s  performance, 

Dudani  addressed  the  problem  of  automatic  aircraft  identification. 
He  defined  a six-class  problem  and  used  a moment  invariant  feature  ex- 
tractor. The  six  authority  files  consisted  of  551  12-tuples  each. 

The  authority  files  were  obtained  from  images  of  the  aircraft  at  roll 
angles  from  0 to  90°  and  azimuth  angles  from  -70  to  70°,  both  in 
5-degree  increments.  The  total  number  of  vectors  in  the  authority  files 
was  thus  3306.  « distance-weighted  k-nearest  neighbor  classifier  was 

used.  Dudani's  test  set  consisted  of  132  Images  obtained  independently 
and  in  addition  to  the  3306  images  used  in  the  authority  files.  Of  the 
132  test  images,  22  images  were  from  each  of  the  six  classes,  with  random 
roll  and  elevation  angles.  The  performance  of  the  original  system  is 
shown  on  the  first  line  of  Table  3.13.  The  last  column  lists  the  image 
numbers  from  the  test  set  that  were  incorrectly  classified. 
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Authority 

Training 

Recognition  Accuracy 

Incorrectly  Classified 

File  Depth 

Samples 

No.  Correct 

% Correct 

Test  Image  Numbers 

551 

— 

126 

95.4 

4,23,31,83,125,130 

184 

3000 

116 

87.9 

4,5,16,18,23,31,41, 

61,62,118,122,125, 

128,129,130,131 

184 

6000 

121 

91.7 

16,23,31,64,75,90, 

125,127,128,130,131 

Table  3.13  Activity  Sort  Rule  on  a Large  Training  Set 

To  test  the  activity  sort  training  rule,  the  authority  file 
depths  of  Dudani's  classifier  were  arbitrarily  reduced  to  one-third 
of  their  original  size,  from  551  to  184.  The  activity  sort  rule  was 
used  to  pick  the  new  authority  file  contents  from  the  training  set 
consisting  of  the  3306  vectors  which  comprised  the  original  authority 
files.  After  3000  random  samples  from  the  training  set  had  been  examined 
by  the  training  algorithm,  the  recognition  accuracy  was  tested  with 
Dudani's  132  independent  test  images.  The  training  then  continued  for 
another  3000  random  samples  from  the  training  set  and  the  recognition 
accuracy  was  again  tested  with  the  132  test  images.  The  results  are 
shown  above  in  Table  3.13.  It  can  be  seen  that  after  6000  samples  the 
recognition  accuracy  was  approaching  that  obtained  with  the  551-deep 
authority  files  (92%  versus  95%).  It  is  interesting  to  note  that  test 
image  numbers  23,  31,  125,  and  130  are  missed  in  all  cases;  this  might 
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Indicate  the  presence  of  Inaccurate  training  or  test  data.  In  Table 
3.14  the  number  of  vectors  in  each  authority  file  and  the  number  of 
samples  in  each  class  presented  to  the  training  algorithm  during 
training  is  given  at  250-sample  increments.  It  can  be  seen  that  the 
number  of  samples  of  each  class  presented  to  the  training  algorithm 
remained  fairlv  equal.  The  number  of  vectorp  in  the  authority  files 
versus  the  total  number  of  input  samples  during  training  are  graphed 
in  Figure  3.2.  The  authority  files  seem  to  have  filled  at  a logarith- 
mic rate.  Note  that  the  authority  files  were  not  full  when  the  experi- 
ment was  terminated  and  that  the  number  of  vectors  in  each  file  varies 
considerably  from  class  to  class. 

The  experiment  was  terminated  after  6000  training  samples 
because  of  the  computation  time  involved  (60  hours  on  the  PDP-9 
computer) . It  would  have  been  desirable  to  continue  until  the  author- 
ity files  were  full,  or  to  reduce  the  depth  of  the  authority  files 
(to,  say,  50)  to  determine  exactly  which  vectors  would  be  saved  and 
the  recognition  accuracy  then  achieved.  This  experiment  does  show, 
however,  that  the  vectors  selected  by  the  activity  sort  rule  will 
provide  good  recognition  accuracy  when  tested  on  an  independent  test  set. 


CHAPTER  IV 


SYSTEM  OPTIMIZATION 


4.1  Introduction 


This  chapter  concerns  itself  with  the  optimization  of  the 
circular  autocorrelation  feature  extractor,  a comparison  of  the 
nearest  neighbor  classifier  and  the  distance-weighted  k-nearest 
neighbor  classifier,  and  the  need  for  normalization  of  the  feature 
vector.  A demonstration  of  the  translation  and  size  invariance  of 
the  circular  autocorrelation  function  is  provided.  The  behavior 
of  the  circular  autocorrelation  function  under  rotation  is  also 
demonstrated.  The  circular  autocorrelation  function  performance 
for  various  values  of  parameters  am,  M,  and  N is  investigated. 

The  120  facial  profile  feature  vectors  generated  with  the  chosen 


am,  M,  and  N parameters  are  listed. 

Two  distance-weighted  k-nearest  neighbor  classifier  weight 

functions  are  described,  one  dependent  upon  the  Euclidian  distance 

between  the  unknown  feature  vector  and  the  authority  file  vector, 

2 

e , and  the  other  dependent  upon  (e^)  • Recognition  accuracy  of  a 
classifier  using  each  function  is  obtained.  The  recognition  accuracy 
of  a nearest  neighbor  classifier  is  compared  to  that  of  a distance 
-weighted  k-nearest  neighbor  classifier.  Normalization  of  the  feature 
vectors  before  classification  is  discussed,  and  the  mean  and  standard 


67 


■- 

£ 


R 


I 

If 

0 

0 

0 

i 

i 


i 

j 


. s 


deviation  of  the  12  circular  autocorrelation  feature  vector  components 
over  the  120  facial  profile  samples  are  computed. 

Although  the  feature  extractor  and  classifier  are  discussed  In 
separate  sections,  the  development  and  optimization  of  both  occurred 
simultaneously.  It  should  also  be  pointed  out  here  that  the  two  al- 
gorithms Interact  to  some  extent,  making  optimization  a very  difficult 
task. 


4.2  Circular  Autocorrelation 

After  the  circular  autocorrelation  function  feature  extractor 
was  written,  it  was  felt  that  an  experimental  demonstration  of  size 
invariance  and  hehavior  under  image  rotation  was  in  order.  It  was 
decided  to  arbitrarily  set  the  circular  autocorrelation  parameters  at: 


so  that 


M - 1 
N - 12 

% " 1/2 

«-£  cos(  Ilgil  ) 


V . Z*  sin(  iigii  ) 


The  area  A was  computed  as: 


OO  CO 


A - l l f(mY,ny) 


(4-1) 

(4-2) 


(4-3) 
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that  is,  the  sum  of  the  one  cells.  The  actual  g(l,n)  output  was 
multiplied  by  100  and  truncated.  It  was  felt  that  this  would  be 
sufficiently  accurate  and  allowed  Integer  arithmetic  to  be  used  in 
the  classifier.  This  may  be  written  as: 

g(l,n)  - [ g(u,v)]  . (A— 4) 
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A simple  experiment  was  performed  to  verify  the  theory  and 

also  to  obtain  an  idea  of  the  amount  of  feature  vector  variation  to 

be  expected  for  real  Images.  A white  rectangle  9"  x 12”  was  placed 

in  front  of  the  television  camera  and  the  feature  vector  for  various 

rotation  angles  wa3  computed.  The  results  are  shown  in  Table  4.1  as 

samples  number  1-17.  The  numbers  in  the  area  column  are  the  total 

number  of  one  cells  in  each  array.  The  feature  vector  of  sample 

number  1 was  used  as  the  authority  file  vector.  Since  a rotation  of 

the  input  image  by  ir/N  radians  (in  this  case  n/ 12  radians  or  15°) 

causes  the  circular  autocorrelation  feature  vector  components  to 

rotate  (see  Section  2.5  and  Eqs.  2-50  and  2-51);  each  unknown  feature 

vector  must  be  compared  against  the  12  possible  variations  of  the 

authority  file  vector  to  find  the  best  match.  The  numbers  in  the 

2 

closest  match,  angle,  and  error  columns  refer  to  the  ir/12  radian 
rotation  (angle)  for  which  the  smallest  Euclidian  distance  squared 
(error^)  between  the  unknown  and  sample  number  1 was  obtained. 

Because  of  the  rectangle’s  symmetry,  the  circular  autocorrelation 
function  feature  vector  of  the  rectangle  should  also  be  symmetric. 


i 
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II 
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it 
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that  is: 

g(l,n)  - g(l,  14-n)  (4-5) 

for  n ■ 213,4,5. 

That  this  feature  vector  symmetry  was  not  quite  achieved  points  to 
distortion  in  the  input  system. 

It  can  be  seen  from  Table  4.1  that  the  error  for  images  3-6 
is  considerably  larger  than  that  for  images  at  ir/12  radian  rotation 
increments.  This  is  to  be  expected  since  the  feature  vectors  for 
images  rotated  at  other  than  r/12  increments  will  not,  in  general, 
directly  correspond  to  the  zero  radian  feature  vector  (see  Section 
2.5).  The  other  errors  are  probably  due  to  input  distortion,  and 
give  an  idea  of  the  ultimate  accuracy  of  the  system.  Size  invariance 
was  tested  by  moving  the  camera  farther  from  the  rectangle  and  ob- 
taining two  more  images.  These  are  given  as  samples  numbered  18 
and  19  in  Table  4.1.  The  area  of  these  two  images  was  reduced  by 
about  a factor  of  5 from  the  other  images,  producing  images  less  than 
one-half  the  size  of  the  comparison  image.  As  can  be  seen  from  the 
table,  there  was  little  change  in  the  feature  vectors. 

Once  it  was  verified  that  the  feature  extractor  was  perform- 
ing as  expected,  the  next  step  was  to  optimize  the  values  of  am,  M, 
and  N.  This  was  attempted  with  the  following  experiment.  An  input 
space  of  60  facial  profile  images,  6 images  per  subject,  for  the  10 

subjects  was  chosen.  A classifier  of  the  distance-weighted  k-nearest 

2 

neighbor  type  with  k fixed  at  10  and  the  (e  ) dependent  weight 


function  (to  be  described  in  the  next  section)  was  used.  The  classifier 
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searched  only  the  zero  angle  feature  vector  sequences,  that  Is, 

rotational  Invariance  was  not  attempted.  The  authority  file  depths 

were  set  to  three,  l.e.,  three  feature  vectors  per  class  were  used  to 

define  each  decision  surface,  and  the  authority  files  were  filled 

using  the  activity  sort  training  algorithm  (described  in  Chapter  III) . 

Six  sets  of  values  for  a , M and  N for  the  circular  autocorrelation 

m 

function  were  then  compared  hy  training  the  machine  on  the  circular 
autocorrelation  feature  vectors  derived  from  the  60  images  and  testing 
the  recognition  accuracy  of  the  machine  using  the  same  60  images. 

The  results  of  this  experiment  are  summarized  ir  Tables  4.2  and  4.3. 
Typical  feature  vectors  for  parameter  sets  1,  3,  4,  and  5 are  given 
in  Table  4.4,  whil.e  Table  4.5  contains  the  feature  vectors  for  all 
120  facial  profile  images  for  parameter  set  2;  am  = 1/2,  M-l,  N=*12. 

From  Table  4.3  it  can  be  seen  that  subjects  1,  6,  and  7 were 
consistently  misclassif ied  as  belonging  to  another  class  for  all 
parameter  sets  and  subiect  0 was  misclassified  for  4 of  the  5 sets. 

Th-sre  was  no  apparent  pattern  to  the  misclassif ications  t seems 
that  these  particular  images  are  in  some  way  more  ’variable ’ than 
the  other  images  and  tne  region  in  the  feature  vector  space  to  w'ich 
they  belong  is  harder  to  define.  The  low  recognition  accuracy  of 
parameter  sets  1,  3,  and  5 is  not  surprising  when  the  vectors  in 
Table  4.4  are  examined.  For  parameter  set  1 (a^  * 1)  there  are  sever *i 
zero  terms  in  the  feat  are  vector,  terms  which  contribute  no  information, 
while  for  parameter  set  3 (a^  * 1/4)  the  terms  of  the  feature  vectors 
exhibit  little  variation  from  class  to  class.  The  recognition  accuracies 
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Parameter 

Set 

a 

m 

M 

N 

Reco 

*nition  Accuracy 

No. Correct 

7.  Correct 

No.  of  Classes 
100%  Correct 

1 

1 

1 

12 

47 

78.3 

5 

2 

1/2 

1 

12 

52 

86.7 

6 

3 

1/4 

1 

12 

49 

81.8 

6 

4 

1/2,1/ /l 

2 

6 

51 

85.0 

6 

5 

1/3,1 

2 

6 

50 

83.3 

5 

1. - , — 

Table  4.2  Recognition  Accuracy  for  Selected  Circular  Autocorrelation 
Function  Parameters. 


Parameter 

Set 

Recognition  Accuracy,  % Corr^ct/Class 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

i 

50 

100 

50 

100 

100 

50 

50 

100 

83 

100 

2 

50 

100 

100 

83 

100 

50 

83 

100 

100 

100 

3 

50 

100 

100 

100 

100 

67 

50 

100 

50 

100 

4 

50 

100 

100 

100 

100 

67 

67 

100 

67 

100 

5 

50 

83 

100 

100 

100 

67 

67 

100 

67 

100 

Table  4.3  Recognition  Accuracy  per  Class  for  Selected  Circular  Auto- 
correlation Function  Parameters. 


s.  '•*-..•>  issaaeaasassaasHBSMsi^ES^ 


73 


.^sesskssssS!- 


Para- 

meter 

Set 

Class 

Feature 

Vector  Component,  n 

i 

1 

2 

3 

4 

5 

6 

0T 

7 

^rr 

8 

(3) 

9 

~UT 

10 

(5) 

11 

is r 
12 

1 

1 

0 

0 

0 

0 

10 

31 

35 

18 

5 

0 

0 

0 

3 

1 

48 

51 

52 

58 

64 

68 

70 

69 

66 

57 

50 

49 

1 

4 

1 

6 

16 

45 

58 

37 

13 

(0) 

(7) 

(34) 

(57) 

(27) 

(5)  ! 

5 

1 

31 

39 

57 

64 

56 

35 

(0) 

(4) 

(31) 

(54) 

(25) 

(4) 

1 

2 

0 

A 

W 

0 

0 

8 

27 

40 

24 

8 

0 

0 

0 

3 

2 

44 

44 

48 

56 

62 

67 

70 

68 

63 

55 

47 

43 

4 

0 

«• 

4 

8 

38 

59 

35 

11 

(0) 

(2) 

(27) 

(55) 

(25) 

(3) 

3 

2 

28 

33 

54 

63 

53 

32 

(0) 

(1) 

(24) 

(53) 

(23) 

(1) 

1 

3 

0 

n 

0 

1 

23 

20 

13 

7 

5 

0 

0 

0 

3 

3 

38 

41 

47 

56 

62 

64 

63 

59 

53 

47 

38 

36 

4 

3 

7 

15 

42 

41 

28 

11 

(0) 

(6) 

(36) 

(36) 

(22) 

(6) 

5 

3 

23 

35 

54 

53 

41 

25 

(0) 

(4) 

(32) 

(31) 

(18) 

(3) 

1 

# 

<♦ 

0 

0 

0 

0 

9 

19 

30 

23 

15 

0 

0 

0 

3 

4 

40 

41 

44 

49 

54 

60 

63 

67 

65 

57 

49 

43 

4 

4 

* 

9 

30 

50 

40 

10 

(0) 

(4) 

(23) 

(47) 

(31) 

(4) 

5 

4 

23 

28 

44 

56 

56 

32 

(0) 

(3) 

(21) 

(44) 

(30) 

(3) 

1 

5 

0 

0 

0 

0 

13 

32 

34 

17 

4 

0 

0 

0 

3 

5 

44 

47 

53 

61 

67 

68 

66 

64 

58 

51 

47 

45 

4 

3 

7 

16 

46 

54 

28 

13 

(0) 

(6) 

(39) 

(54) 

(20) 

(6) 

5 

5 

29 

40 

60 

59 

48 

34 

(0) 

(5) 

(37) 

(51) 

(19) 

(5) 

1 

6 

0 

0 

0 

1 

13 

31 

33 

20 

6 

0 

0 

n 

3 

6 

38 

43 

47 

57 

63 

67 

65 

63 

58 

51 

42 

41 

Continued . . . 


Para- 

Feature  Vector  Component;,  n 

meter 

Class 

1 

2 

3 

4 

5 

6 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Set 

7 

8 

9 

10 

11 

12 

4 

6 

4 

10 

39 

52 

27 

0 

(0) 

(4) 

(33) 

(48) 

(20) 

(4) 

5 

6 

21 

34 

53 

59 

46 

28 

(0) 

(3) 

(28) 

(45) 

(16) 

(3) 

1 

7 

0 

0 

G 

0 

5 

24 

40 

29 

8 

0 

0 

0 

3 

7 

49 

51 

54 

59 

65 

70 

71 

71 

68 

59 

54 

51 

4 

7 

6 

12 

39 

64 

41 

15 

(0) 

(5) 

(30) 

(61) 

(31) 

(6) 

5 

7 

34 

38 

58 

67 

60 

38 

(0) 

(3) 

(27) 

(58) 

(28) 

(4) 

1 

8 

0 

0 

0 

0 

13 

22 

20 

19 

13 

0 

0 

0 

i 3 

1 

8 

44 

50 

53 

53 

55 

54 

52 

54 

54 

50 

49 

48 

1 

! *» 

8 

11 

20 

40 

35 

20 

15 

(1) 

(12) 

(37) 

(36) 

(13) 

(8) 

5 

8 

31 

40 

47 

44 

42 

37 

(0) 

(ID 

(37) 

(38) 

(12) 

(7) 

1 

9 

0 

0 

0 

1 

12 

25 

34 

24 

10 

0 

0 

0 

3 

9 

40 

44 

48 

54 

58 

64 

66 

67 

60 

53 

45 

42 

4 

9 

3 

10 

29 

52 

32 

10 

(0) 

(3) 

(24) 

(48) 

(24) 

(3) 

5 

9 

24 

34 

47 

58 

49 

31 

(0) 

(2) 

(21) 

(46) 

(20) 

(2) 

1 

10 

0 

0 

0 

1 

4 

14 

32 

30 

13 

1 

0 

0 

3 

10 

42 

43 

44 

48 

55 

60 

61 

62 

59 

53 

49 

45 

4 

10 

9 

15 

31 

49 

40 

19 

(0) 

(8) 

(22) 

(46) 

(30) 

(8) 

5 

10 

| 28 

i 

J 

31 

46 

54 

51 

38 

(0) 

(6) 

(19) 

(41) 

(27) 

(6) 

Table  4.4  Typical  Feature  Vectors  for  Selected  Circular  Autocorrelation 
Function  Parameters. 


©N  * 


SgT 


<. 


i 


m 


**  * ‘ 


mm mm 


•h  K}«r  «-«<*  o <e  *-*-*rvj«Hcvrje3f\*»-4coof^tr»*ococ\jo 


s euK>f^fOK»®oro<v»-iwu>*scoo'rofO«HCNJryK)ro®csjO'*s'in^rfo^Hr^<M^ 

w4  f4  (\|rH  C\J  C\»  C\J*-4(\jC\i(\jC\iC\jC\J  T4«>4(MCNJC\jCNjrsJCMCNJCgC\j'r<*~t'H'-|'H,-<v-«C\J«-< 


o m & o >o  *on  r^rs.  r^oo  ir\fOinococoHtisoo'  *-«  co  r*  'O  in  cnj  h q^n 

?"0  **  fO  ro  fOfO  nw  19W  row  rofO  po  »*>  ro  v v roro^rCV<\iCMoiCS4(\jfOroCVJ 


to  O r-«  *0  ^3  "OCDfSifOC>JCSea«H^C»J*H^rNOrOm‘OsO'OfOlA,«‘OOr^'0'C*-<fO<SJ*H 

in  in  uma  v ia  jma  v m in  in  tntn  m *n  m m minininmpOfo*Ofo*Ofovin^ 


rv  COO  S ® Q S O*  «H  ▼“*  O coo  O O HfOfO(\j(\JfOCVif^H(\jHS-0(NJ 

If*  Xi  o «OlAino0^lA|/MA  U>  O O o o O OOO'O'O'f^VJMTv’OA'OlA 


*0  f^Q  »<  0 *0  \f\  PO  V VO  0*0  fO  C\J  C\J  O CO  *«r  AJ  *-<  C\l  K>  <S  PO  f0C3lT'P0CS1K>O00 

m tmn  *«*  in  i>iAinu>  vt-  ^iAiAiniM^tfMAUMniAi/MA^tr\u>i/MAUMninirt 


«A  ininK)Hlf\HMA'OVfOf  GO  <J  *OrOfOOO  Olf»  fO  ^ O WK)  WO  ^fO  V 

^TfOtO  foro  5T  ro  K)  fO  fO  »0  fO  ^ tOK#  V fOfOfOfOiOrO'OV  V ■*■  V ^ ro  v 


m-  co^K.florvjK>fv:fo»ooj«Hfoa;'Oco:s'  o^-'tONcooooco^m«omv«ooN 

CM  CM  CM  r-»CMCM/\JrMC\iC\jCMCM<M  r^*-<C\JC\jrM«-<«Hr-»*-I^H^CSICMCMcSiC\iC\jCViC\J<^ 


fo  *>«*’tMvHin?OfMvinro,e^  coo  '©^rpoiacocoa>cor^toincMfOfO*H«-iro  o *o 


(M  CicOvTMX\COr>*  00  NN  O®  ^fOC'JTJOJAfOf  tfOWVCMAiA'OlAlA'OCO'O 


w 'C<Mr  vnvO^ia -oin  00  M-CMCMor^^fOCMCMCMCMfor^roinfOtMCMCMin^ 


or  cc  <r  cr 

LJ  Ui  U!  IaJ 

iq.  na 
o 1:  3 => 

<<«««* 

* ac  X x 


crrrctrr'rorcra:  000000  o o o c o < 

UJ  UJ  U*  uj  Lu  LJ  UJ  uj  </>  CO  1/1  (/)  (rt  to  (/)*/)'«/)  5/1  (/*  ! 

a.  x a a.  x a.  a.  *x  o c*  o o o o o o o o o 1 

3 0 0 0 0 0 3 0-»00_»_JO— IO-JO-J. 
<x  < -t*t  < -t  < < < J u ul  U U U U u)  Ul  U lb  I 
^ics:x:'<:y:3c^a^a.cLaQ.Q.Q.aLai 


•<  < <t 
-J  -J  -I 
I -J  _J  -J 


-»  -J  -J 
_l  -J  -I 


000 
000 
) o o o 

» QD  lT  CD 


OOO 
OOO 
OOO 
CD  CP  CD 


O O 00  >~>~5“JOOOOOOOOOOO< 


Hf\*  JO  V J'  O N V ^ \ d »\J  H (N,  <0  V tA  O N CC  o N H t\.  H l\J  ^ ^ O N a>  O 

wH  *-»  i-*  H H 


H H H H H b\,lM<MC\.:M<M(\*;\.{\jnjC\;rO>0,OfOrO»0  o JO  to 


& 1 


1 • 
$ ■ 


* 


CLASS  SAMPLE  NAME  FEATURE  VECTOR  COMPONENT 


tf\tf\5rmf^t'0*oor^if\otnfs‘0^*  > o -o  «c  coin  nmt<o  iao  o vcon  ^ 


QOOOf^  ®HMAN(M>0‘Q{VO'COWfM(>SHa  6)  «t-4  O O OOO^^OtH^H,-* 
H H ri  H h rl  ^ ^ H ti  H h H 


co^oinrvj'srcocor^  co  o o H'roo  s h n <d  co>o  a**  r>*  r*.  »o  o p^  ohco<oo«o 
HH^NCVJHrCHHH  H CM  (\J  ^ H W (\l  ▼*<  H H H H t^r< 


H©N65W.'\»fOf')rvJ'ONCON'OCOCOfO>S>0®®  0‘0'H^O(>skj(\J(\jQQ?o 
fO  rv}  ^ crr/)t<JfO'OWfOI<)(OMfOWfOty5  CMPOfO(\JC\j?OtOC\Jf\|C\JfOrOrOfO»OfO 


N)  o ji  <>  *•<  a)  ^ ea  *t  t 'ft  i*>  *o  co  cc  o o in  *o  o fo  «o<x>(\jtftr?«OK>rvjfN. 
^fO^M-infO«^rfi/5rwcir,r^'r<ifVv,T5rvv,fv  * *t  <r  nr  v v 


fs^O^WCOHONtO-OOHOtO^N  COHNCOO'OOS^iA^lAWrON 

»ir>irviOtfMft'OiAiA'Oif\in  oiorn  tft  u\  u\  m in 


(ONA.  {VWlfMT\,/),AfO^'0'OS'Oa3NNH«D'OOaO'OTO‘2Hr<cOirNH 
iA  in  u<  t fo  fo^>  ^ v v v <►  vt^in^in'O'JM^in^inir  m m tv  vA  in  in  in 


*r«omrs:»Hrvf'-?^iAiAc\j,-<*Tcvco*of^*  rv  ^OHinN'Os'O^oocjindco^ 
5r^vfOrocvJC\jrgrofOfoiOrotocu^  rt  ^ ^ v^^-^r^r^r^-foro^ro^  <r  fo 


cDcoor^iOT'jrvoc‘j^«ciio  -?  <s » o t&  z>  O'Otaoj'Cco^m^OfosnmQNrtQ 
c\j  rvj  ou  HWHriHfvconi^niAiH^Jnjtv  cm***  c\jc\ir\jc\jc\jc\jrvj^acvj  cm  cv  cm  ro  c\) 


vin^>QC0'O©roc\ifv*njinnic0'O'Orjc\jNH(\«intMc\)HHSvina'n«H 

H H iH  rl  H H r|  ri  rl  H H ^ r< 


*>ooiA^Tt*>fOcO'Or^;rvfvrvinc*pa«of>.0''0'OaoooNO*>ors*r'*m:oo»f> 


sr,?ffi^inr4HH'Ovtf\fooincj^Mn^“iMOUMnWinfO^  «■*-  v c«t  <0  o cm 


**  < •«! 

-i  -J  -i 

_J  _J  „i  »aJ 
^ _ V 

LJ  o O -J 
;j  O L>  l*»' 
I>  D .O  — 
OD  CD  .P  7" 


^•^v'Krsc^y  y v y:  ^ ^ 
u u u ui  u u u uj  u uj  u)  < < <t  < 'X  <r  <x  «x  < -<*  < <r 

^ y:  -*t  v y*  yt  y 'vc  y w (/>  o')  c/>  w «a  fO  in  O’-*  •/*  <0 

ii_j.J_J_J_J_J-J_i-)*r*<‘  < -*  <<1  -a  < *f  *x  -* 

. ; u . j u?  jj  >jJ  cj  >0  uj  ai  ui  ^ y y y "t  v y.  v 

< •<  < <*  *<  <<  <1  < «t 
t j-  ^ x > y m r?-  ►— v— * — ►— » — « — » — 


tl  -•  ar 
o o o 
/)  (/>  00 


x .d  x 
000 
or  :t  oc 


^ ^ -r 

0000 

10  v/1  yuo 

«£  ^ <:  -z 

J 33  *D  CC 
OOOO 
cr  oc.  a:  rr 


t.  rsi  ^ Cu 

H ^ rv 


h (\j  fo  v n '£>  on  % h ni  ri  Ai  ^ vm'Orv'O^wHn:  ♦-♦rv  •no  7 •aon 

*-4  T~«  ^ 4-»  — • 


>i  -r  -S7  7 rr  <■  cjo  ift  ft  ;(  in  j' m a -a  a .ft  *n  -o  -c  *c  -0  <0  'O  •oi 


Vv^ 


77 


tfif%<ft-o>«0(vif>?v  to  as  ®rvif»tv  <■  css  > »h  <-  O'  o<f  iow  it  m ® in  ® rv 

H H *1  rt  *H  *t  r<  ^ r4  ^ 


rtrlriS  Wlfl  M H H H W »-C  CM  rtt-IJijCa  Ifl  CM  CM  W »«»»  & O UrldSSHS 

r*C  *■«  *H  r(  tt  *H  **  tftf  rtt^  »•;  ^ H ^ H N rt  «-*•-*  r*  r4 


o-  ® as  r-  o ot  »a  Nifi  *■<•-»  *-i  esa  ra  -<  <■'  f co  -o  .-■  rv  *-•  as  to  ® «s  ea  o ts>  as  .-t  o> 

»-(T-<»H<-'rt!NJC'J'-<  CM  CM  CM  CM  CM  CM  CM  CM  CM  »H  »-«  *-s  CM  „ CM  CM  rt  CM  CM  CM  CM  ■r-l  CM  «-C 


P0t-trocsaWt-ta<t'OCNCMffl®®t^c>'®K>'SifOScj>'ro>O'rfrvr^i»CMir»rtr^r^ 
PO  CCCO  to  ,»J  ▼ ^ K.  '.'It O MtOtO*OK><OfftCMCMCMCM»-tCM*^'HtMrttCrt»OrorOiOlO 


^ifttvf'Oofo,a’OiAWHM(Nj>Trj'OOtci'C")(M'HiM^ioi?'i;  ® co  in  cm  ss 

tttffiniflf  UM^JMfllCMTilfjtfliTlCMCMCM  CMCM>oCMCMtMCM»-,CMM-  f V (lift 


f^UMv'»lC'«-CM!/'r'-'M'0»t<raCM»-CTtcS>irt®'rtCM®CBlfunCM«vCMVCMK)'rC,»rv 

in  t n inin  in  « o in  in  -c  ««  o -o  -a  o «io  row  KscM^cviMCMCMtMcMinincnm® 


w to  Hi  w '■*>  * f o h w w N m f n a,  r<)  vr  O'  tr  <T  hn  -o  ® cm  ta  -o  ® to  cm  ^ vr  ^-c 

inirt'/iJMr'inuifimn^i/'inu'ifiTin’fwwnrotcywc'JWPiNOf'f'tin 


>om«>^®0'W»-t®p»«in'On.^mr»fam{MCMT-icMfflSi'ncM®iso.®i^wtv 

WIOfWl<JW|t)l<)tol<)tOlOK)niO!Ow’rWWWWvC\iWNIONnNCMCM»» 


rtSNONffScvPJWO'tCHCOr.WIMOCMWHSHOSlWS&O'O'ODN 
CMCMCMCMCmCMCM-^I  CMCMf',;wc\jCM*_4*HC'gCM  CM  CM  CMCMff>CM,-|CMCMCMrM"'>  W CM 


W ■ 


K :! 


iH^tiWro^cvjNircaHHO  coo  ^ c\j  Wfo  so*  «-*«•;  t*}  *1*  ts  ri  s co  o*  o*  w 

H-HrtH  h r-t  w v*  *-l  H 


« I 


kmaia  c<o  ♦ io  O'O  *r  *o  «o  *t  ^ t\ico  n a n ^ o r«  s q c$>  <x>  jt;  ^ ^s\0  co 

*-*  rt  'H  H 


cvforofOfO'Ot^^Hfoirt’OfOinifNOJvHv  r^co  r o^O'<x>ar^o^-«oc^fOfOT^mo 

r<  ^ H H 


? z ^ z^noLooonoot 

ooooottirfffffr  cr  tr  < 
t/i  lO  (O  tO  CO  «X  «*  <T  'X  < < « < « 

rxx^x^  : 

H-  >—  ►-  ►—  >-r 

jjx  4icd  jjD 

OOOOOOOOOOC  c o x < 

u:  a:  a:  cc  c*  to  c/1  co  to  to  to  ^ to  ^ t 


’2*  <z  t: 

i ui  ui  u 

< < < 

I *.o  to 

lO  t/i  t/> 


2 .Z 

t*J  Uj 

2 2 N>’ 
< < h 

co  to  a: 
to  y>  < 
— -*  *x 
cr  cc  to 


x or  rr 
« < < < 
X 3 X X 

to  to  to  to 


1%'  * 


If 

II  - 


ooootj'v*!— *-*->-*-  i—  >—  t—  i 


cc.oeci'^'fMt-icy'^f  »>  * ►.  * o-  f J w fit  r»  cu  w v «>  a n f e 'i  h m *-c  cm  iw  «r  in 


■o  -c  -o  'C'Oi'.r*  nn  rv  f>  **  tv  iv  tv  tv  r-  f d a D i o a ai  s,  .n  s m » * o.  c»  o> 


r ^ 


78 


. i 


t"  in  tn  in  v n nj  in  in  m ou  tv  cm  »-•  s ® 


•ni>soso>(S'Owc>r;  <«  •eiftfw 

rl  r*  H *”* 


®»HH|>o«)»»ofa  NV  yj^r  « IO  fO  H 
«Sr-4CJ<NJ»H«HCMt\lf\J<MtOCMCVI<MC\jC'ifvJt'JC\) 


M 

« 


39 


V NCO  CMT'  OLTVSJO^rKl  W'OintO  'OlOCVir-i 

n ro  v>fO  rofo  *o^  wwvfOfofOfO^w^ro 


rt  'OWVQNNCVO'O^IS  WHO'O'OCO'Oin 

vvrvu\irv^rv-irv‘T^r^‘ir»K>ifN*^?or^?o^oro 


O WVNfOHQNOCtHOVOflON^MA 

£ u>tAir»t^ir\tf>tn^‘^r,*’if**oirvfOfOfOfop'>*o 

o 

o 

05 

o u\c\Jir»tn*-<0’©c\jGic'a*H,ot^:oors«*cO'Oif» 

£«  v^'r,*‘vfrf)'r<‘’rv7'»0«“w)f0^wK)K) 


<r  no*  n hi^s  ^cvjcmjo  *-iincaeaea«Ht-i?o 
K>CVJC\JfOfOfOfOfOrofO  CMfOpororororOfOro 


s«>'0r,Q®'0*3f/)(\js  cm  cvj.ca  h hS  w^r 

CM*-<*HC\J(\J*-«*HCMCVOiC\irViCMCvj<\JC\»C\lCMW 


«J 

d< 

d 

5 

<w 

o 

d 

d 

Pu 

d 

o 


0> 

M 

o 

o 

o 

4i 

d 

< 


8 

•3 

O • 

H W 


%4 

o X 


cvoa>ojoa(\:<CiANr*-,,T  >o  v <?■  ro  if>  fo  ro  ifv 

H H r«  r<  HH»*<Hr<r<r<HHr4HH 


in.  *a^t  so  «o^,NU'<r  h vco  so  hS  a h 


^•rvirovro^rCMOCvjruo  ro  *o  o o-  ca  o sa  o* 

H ^ n H 


.Hjrurwnjmrgni-*  — — — 

T t : i:  e r j:  r r ^ 

a <j  ,r  < < < >'  '5T3C.  r /.  2.  «.  J-  *• 

^ 4 ’K  _>  t 

in  tr  <n  cn  t/  <r.  - liTXlTX  — J — *- 


->->->  ■■)  “>  i 


->  t rrxrxTXixxx 


a 

CO 

WJ 


o rN  ro  cn  «-  .v-4<vfO'Csjr''Cf^cc»o'T;^^ 


to  rS 

M 

O S 

U 

o X 

« 

> 

a *v. 

V4  *H 

3 

w 8 

2 d 
&«  « 


tA 

sr 

© 


i 


of  parameter  sets  2 and  4 are  about  equal — the  limit  here  may  be  the 


dimension  (MN)  of  the  feature  vector.  It  was  decided  not  to  investi- 
gate larger  dimension  feature  vectors  and  on  the  basis  of  its  recog- 
tition  accuracy  and  simplicity,  parameter  set  2 was  chosen  to  be  used 
with  the  circular  autocorrelation  function  in  the  remainder  of  this 
work. 


4.3  Comparison  of  Some  Classification  Algorithms 


In  attempting  to  decide  on  the  type  of  classifier  to  be  used 
in  this  pattern  recognition  system,  statistical  classifiers  were  re- 


jected because  the  feature  vector  sample  size  was  too  small  to  determine 
the  necessary  nrobability  densities.  Of  the  non-statistical  classi- 
fiers the  nearest  neighbor  rule  and  the  distance-weighted  k-nearest 
neighbor  rule  were  selected  as  the  most  likely  candidates  to  give  good 
recognition  accuracy.  Starting  with  the  distance-weighted  k-nearest 
neighbor  rule,  a simple  weight  function  was  used: 


w . — — for  0 < e__  < 500 

pq  (e  + 1)T  - pq  - 

pq 


(4-6) 


w • 0 

pq 


for  e > 500 

pq 


T - 5— i2i— 

* <5* + « 


*summed  over  the  k 
smallest  ^q'8  i 500. 


(4-7) 


k was  fixed  at  10,  so  that  only  the  ten  lowest  e 's  were  used  in  the 

pq 

weight  computation  and  therefore  only  the  ter  highest  weights  computed; 
all  other  weights  being  set  to  zero.  This  weight  function  will  tend 
to  give: 
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w •*■  100  for  e -*■  0 

pq  pq 


w -*■  0 for  e ■>  1000 

pq  pq 


w_„  -v  v for  e „ •*•  e„„  . 
pq  rs  pq  rs 


The  above  relations  show  that  this  weight  function  satisfies  the 
criteria  for  a distance-weighted  k-nearest  neighbor  classifier  weight 
function  as  expressed  by  Dudani  {5j.  The  evidence  of  an  authority 
file  vector  close  to  an  unknown  input  vector  should  be  weighted 
more  heavily  than  the  evidence  of  another  authority  file  vector 
which  is  at  a greater  distance  from  the  unknown.  This  is  accom- 
plished by  having  a weight  function  which  varies  with  the  distance 
between  the  unknown  and  authority  file  vector  in  such  a manner  that 
the  weight  decreases  for  increasing  unknown  to  authority  file  vector 
distance.  The  above  relations  deteriorate  as  the  distance  between 
the  unknown  and  the  authority  file  vector  increases  (as  e^  approaches 
1000).  Any  error  (distance)  above  500  is  considered  large  enough  to 
make  any  correspondence  between  the  unknown  and  authority  file  vector 
unlikely  and  thus  its  corresponding  weight  is  set  to  zero. 

With  k set  to  10,  authority  files  of  depth  3,  and  a feature  extractor 
with  parameter  set  a^  * 1/2,  M » 1,  and  N ■ 12,  the  classifier  was 
tested  on  facial  profile  images  1-6  of  each  class. 

The  authority  file  vectors  were  selected  to  give  the  best 
recognition  accuracv  on  the  same  60  images  and  the  result  is  given 
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in  Table  4.6  as  test  1. 


In  examining  the  results  of  the  experiment,  it  was  noticed 


that  the  weight  function  adeemed  to  assign  too  large  values  to  terms 


with  large  errors.  Correspondingly,  the  weight  function  was  modified  to; 


— for  0 < <e  )2  < 500  (4-8) 

<(epq)2  + m - " - 


w « 0 for  (e  )z  > 500 
P'l  pq  - 


T « \ *summed  over  the  k 

* ((e  )*  + 1)  smallest  e2  's  < 500 

pq  p<?  ~ 


(4-9) 


All  other  parameters  were  held  constant  and  the  experiment  repeated. 


The  result  Js  given  as  test  2 in  Table  4.6.  A significant  improvement 


in  recognition  accuracy  was  obtained,  and  therefore  this  weight  function 


was  used  for  all  subsequent  work. 


With  the  weight  function  for  the  distance-weighted  k-nearest 


neighbor  classifier  selected,  a comparison  of  this  classifier  and  the 


nearest  neighbor  classifier  was  performed.  For  the  feature  extractor 


parameters  a,,,  * 1/2,  M = 1 and  N ■ 12,  k * 10  for  the  distance-weighted 


k-nearest  neighbor  classifier,  and  an  authority  file  depth  of  3,  both 


classifiers  were  trained  and  tested  on  the  full  set  of  120  facial  profile 


images.  Training  was  accomplished  using  the  activity  sort  algorithm 


discussed  in  Chapter  III.  The  results  of  this  experiment  are  shown  in 


Tables  4.7  and  4.R.  It  can  be  seen  that  the  results  are  very  similar 


for  the  two  classifiers,  but  influenced  bv  the  higher  number  of  classes 


classified  100%  correctly  and  the  work  of  Dudani  [5],  the  distance 
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Test 

Weight 

Recogniton  Accuracy 

Number 

Function 

No.  Correct 

% Correct 

1 

Eq.  (4-6) 

45 

75.0 

2 

F.q.(4-8) 

44 

81.7 

Table  4.6  Weight  ^unction  Comparison  for  the  k -Weighted  Nearest 
Neighbor  Rule. 


Classifier 

Recognition  Accuracy 

Type 

No.  Correct 

7.  Correct 

No.  of  Classes 
100%  Correct 

NN 

98 

81.7 

4 

k-NN 

98 

81.7 

5 

Table  4.7  Recognition  Accuracy  for  Two  Classifier  Types 


Classifier 

Recognition  Accuracy,  / 

Correct/Class 

Type 

1 

? 

3 

4 

5 6 

7 

8 

9 

10 

NN 

75 

83 

67 

100 

83  83 

58 

100 

75 

92 

k-NN 

18 

83 

67 

33 

100  83 

67 

100 

75 

100 

] 


Table  4.8  Recognition  Accuracy  per  Class  for  Two  Classifier  Types 
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Feature  Vector 
Component , n ean 

Standard 

Deviation 

1 

5.01 

2.72 

2 

6.65 

2.43 

3 

12.03 

2.76 

4 

21.87 

3.76 

5 

35.98 

5,75 

6 

47.80 

8.48 

7 

51.74 

10.34 

8 

44.48 

8.89 

9 

32.91 

6.15 

10 

19.93 

3.21 

11 

11.67 

3.27 

12 

i 

7.38 

2.96 

Table  4.9  Feature  Vector  Component  Means  and  Standard  Deviations 
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weighted  k-nearest  neighbor  classifier  was  chosen  to  be  used  in  the 
remainder  of  this  work. 

In  this  discussion  of  classification  algorithms,  no  mention  has 
been  made  of  feature  vector  normalization.  Feature  vector  normaliza- 
tion was  not  used  in  this  work  for  two  reasons.  First,  from  the  means 
and  standard  deviations  of  the  facial  image  vector  components  of 
Table  4.3,  given  in  Table  4.9,  it  can  be  seen  that  the  standard 
deviations  are  all  of  the  same  order  of  magnitude,  indicating  that  a 
component-by-comnonent  normalization  would  yield  little,  if  any,  gain 
in  recognition  accuracy.  Second,  there  is  no  evidence  that  a vector 
length  normalization  was  warranted.  A vector  length  normalization 
would  leave  the  classification  dependent  only  upo^  the  angles  between 
the  unknown  and  the  authority  file  vectors. 


Rased  upon  the  results  of  the  experiments  described  in  this 
chapter,  the  circular  autocorrelation  parameters  were  set  at  am  - 1/2, 
M - 1 and  N » 12.  It  was  noticed  that  for  weight  equations  (4-8) 
and  (4-9)  usually  onlv  the  first  5 nearest  neighbors  had  significant 
weights  in  the  k -weighted  nearest  neighbor  classifier.  The  distance 
weighted  k-nearest  neighbor  classifier  parameters  were  thus  selected 


as  weight  equations  (4-8)  and  (4-9),  and  k ■ 5. 


CHAPTER  V 
SUMMARY 
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5.1  Summary  of  Results 

This  thesis  has  presented  a series  of  experiments  on  two- 
dimensional  pattern  recognition  leading  to  a system  capable  of 
recognizing  human  facial  profiles.  A comparison  of  two  feature  ex- 
traction techniques  was  wade,  one  of  which  has  been  used  previously 
in  an  operating  system,  the  ocher  original  to  this  work.  Two  deter- 
ministic classification  algorithms,  a piecewise  linear  classifier  and 
a non-linear  classifier,  were  compared.  Two  heuristic  training  rules 
for  authority  file  vector  selection  were  described  and  shown  to  provide 
decision  surfaces  for  good  class  separability.  The  recognition  accu- 
racy of  the  final  system  was  found  to  be  at  least  as  good  as  that  of 
human  recognizers  presented  with  the  same  data. 

The  circular  autocorrelation  feature  extraction  technique  de- 
veloped in  this  thesis  was  shown  to  be  invariant  under  image  size 


*a* 


change  and  translation,  and  to  have  predictable  behavior  under  image 
rotation.  Several  experiments  were  performed  to  determine  the  optimum 
constants  for  this  function  and  it  was  found  that  for  a 12-dimensional 
feature  vector  a constant  radius  of  one-half  the  square 
root  of  the  image  area  and  angle  increments  of  15  degrees  gave  optimum 
results  on  the  test  images.  The  radius  constant  was  found  to  be  not 
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critical,  but  for  values  less  than  one-half,  the  feature  vectors 


seemed  to  have  less  variation  from  class  to  class;  for  values  greater 


than  one-half,  several  feature  vector  components  were  zero.  In  both 


cases  the  recognition  accuracy  was  reduced. 


The  moment  invariant  feature  extraction  technique  described  by 


Hu  [35]  and  used  by  Dudani  [5]  was  also  used  for  facial  profile  recog- 


nition. In  all  cases  the  maximum  recognition  accuracies  obtained  using 


a moment  invariant  feature  extractor  were  significantly  less  than  those 


obtained  using  a circular  autocorrelation  feature  extractor.  Consider- 


ing the  957  recognition  accuracy  obtained  by  Dudani  on  aircraft  iden- 


tification using  moment  Invariants,  the  55-70%  recognition  accuracy 


obtained  on  facial  profiles  was  disappointingly  low.  Three  possible 


reasons  for  this  low  recognition  accuracy  may  be  advanced.  The  facial 


profile  recognition  problem  was  defined  with  10  classes  while  the 


aircraft  recognition  problem  had  only  6.  The  increased  number  of 


classes  increases  the  difficulty  of  class  separation  by  the  classifier 


and  can  thus  reduce  the  recognition  accuracy.  Facial  profiles  are 


very  similar  with  only  subtle  variations  to  determine  one  class  from 


another,  while  in  many  views,  different  aircraft  have  distinctly  dif- 


ferent silhouettes.  Facial  profile  recognition  may  therefore  be  a 


more  difficult  task.  This  assumption  is  supported  by  the  recognition 


accuracy  of  humans  at  the  two  tasks.  Human  recognition  accuracies 


were  73-76%  correct  for  facial  profiles  and  79-92%  correct  for  aircraft. 


The  input  system  produced  Images  with  considerable  variation  from 


sample  to  sample  within  a given  class.  This  means  that  test  images 
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presented  to  the  machine  for  recognition  may  vary  considerably  from 
the  training  image.  This  effect  was  less  noticeable  with  the  aircraft 
images  because  of  the  higher  image/background  contrast  ratio,  again 
indicating  that  the  facial  profile  recognition  problem  is  more  difficult. 
In  aircraft  recognition,  a pattern  recognition  system  using  the  moment 
invariant  feature  extractor  performed  slightly  better  than  humans  (95% 
versus  79-92%),  while  for  facial  profile  recognition  the  pattern  recog- 
nition system  using  the  moment  invariant  feature  extractor  performed 
slightly  wors.i  than  humans  (55-70%  versus  73-76%) . 

Two  weight  functions  for  the  distance-weighted  k-nearest  neigh- 
bor classifier  were  Investigated.  It  was  found  that  the  weight  function 

dependent  upon  the  Euclidian  distance  squared  (e  ) pave  about  7%  better 

pq  - 

recognition  accuracy  than  the  weight  function  dependent  upon  Euclidian 

2 

distance  (e^) . Using  the  weight  function  dependent  upon  ep(j,  it  wa3 
found  that  only  the  first  five  nearest  neighbors  had  significant  weights 
assigned  and  therefore  k was  fixed  at  five  instead  of  depending  upon 
Euclidian  distance.  The  nearest  neighbor  classification  algorithm  was 
compared  against  the  distance-weighted  k-nearest  neighbor  classifier. 
Although  the  significantly  better  recognition  accuracy  of  the  distance- 
weighted  k-neareBt  neighbor  classifier  described  by  Dudani  [5]  was  not 
observed,  the  distance-weighted  k-nearest  neighbor  classifier  did  give 
a larger  number  of  classes  100%  correctly  classified.  There  are  two 
possible  explanations  for  this.  The  weight  function  used  in  Dudani's 
classifier  differs  slightly  from  that  used  in  this  classifier,  and 
Dudani  used  different  training  and  test  data,  while  in  this  comparison 
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the  training  data  was  also  used  as  the  test  data. 

Two  heuristic  training  algorithms,  the  bubble  sort  rule  and 
the  activity  sort  rule  were  described  and  a rationale  for  their  opera- 
tion presented.  The  two  sorting  rules  were  compared  in  a simple 
experiment  and  the  activity  sort  rule  was  found  to  give  slightly 
better  recognition  accuracy.  The  number  of  samples  necessary  to  train 
the  authority  files  using  the  activity  sort  rule  was  found  to  be  about 
three  times  the  training  set  size.  To  verify  that  the  activity  sort 
rule  did  indeed  select  the  authority  file  vectors  to  provide  the  best 
recognition  accuracy  on  an  Independent  test  set,  a large  sample  size 
experiment  was  performed.  Using  Dudani's  aircraft  feature  vectors  and 
pattern  recognition  svstem,  the  authoritv  files  were  generated  using 
the  activity  sort  rule.  After  about  two  passes  through  the  training 
set  the  recognition  accuracy  was  tested  with  ar  independent  test  set. 
The  recognition  accuracy  was  found  to  be  only  slightly  less  than  that 
obtained  by  Dudani  (92%  versus  95%),  while  the  authority  file  size 
was  about  one-sixth  that  used  by  Dudani.  These  results  were  taken  as 
verification  of  the  expected  operation  of  the  activity  sort  rule  for 
automatic  authority  file  training. 

The  facial  profile  recognition  system  used  the  circular  auto- 
correlation feature  extractor,  the  distance-weighted  k-nearest  neighbor 
classifier  and  the  activity  sort  rule  for  training.  The  120  facial 
profile  images  were  divided  into  training  and  test  sets  and  the  recog- 
nition accuracies  for  various  compositions  of  training  and  test  sets 
versus  authoritv  file  depth  were  found.  The  results  showed  that  the 
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maximum  recognition  accuracy  on  the  test  set  did  not  occur  at  maximum 


file  depth,  but  occurred  around  one-half  the  maximum  file  depth.  The 


results  also  indicated  that  maximum  recognition  accuracy  is  achieved  by 


using  a training  set  large  enough  to  contain  the  less  probable  feature 


vector  variations  on  any  class,  resulting  in  a better  defined  decision 


surface.  Recognition  accuracies  on  the  order  of  80%  (76-86%)  were 


achieved  in  several  instances  and  a maximum  recognition  accuracy  of 


90%  was  achieved  using  test  images  independent  from  the  training  images. 


When  three  human  recognizers  were  presented  with  the  same  data,  their 


maximum  recognition  accuracy  was  7.1-76%. 


5.2  Extensions  of  This  Research 


Like  any  research,  this  work  has  raised  more  questions  than  it 


has  answered.  Indeed,  every  aspect  of  this  work  requires  further  in- 


vestigation. The  circular  autocorrelation  function  has  been  shown  to 


work  very  well,  even  better  than  moment  invariants,  on  this  particular 


problem.  Since  circular  autocorrelation  is  computationally  both  simple 


and  fact,  its  performance  on  other  problems  might  very  well  be  worth 


investigating.  Dudani  f5]  has  shown  that  the  distance-weighted  k-nearest 


neighbor  rule  can  perform  significantly  better  than  the  nearest  neighbor 


rule.  This  was  not  observed  in  this  work,  and  an  investigation  into 


the  reasons  for  this  may  be  warranted,  since  the  nearest  neighbor  rule 


is  a simpler  classification  algorithm. 


The  two  automatic  training  rules  described  in  this  thesis  need 


a much  more  exhaustive  treatment.  No  attempt  was  made  here  to 
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provide  a theoretical  explanation  of  their  performance — this  omission 
should  be  corrected.  These  two  rules  seem  to  be  very  useful  for 


authority  file  vector  selection,  their  applicability  to  other  pattern 
recognition  systems  with  other  feature  vector  types  should  be  investi- 
gated. The  experiment  using  Dudani's  data  and  the  activity  file  sort 
rule  described  in  Section  3.5  should  be  repeated  with  smaller  authority 
file  depths  to  determine  how  the  activity  sort  rule  performs  under  these 
conditions  and  lust  which  feature  vectors  are  used. 

For  facial  recognition,  several  hardware  improvements  need  to 
be  made.  The  most  glaring  fault  of  the  hardware  used  was  the  television 
camera's  inability  to  reproduce  a facial  profile  accurately.  A more 
sophisticated  input  device,  oue  able  to  distinguish  between  flesh  tones 
and  hair,  clothing,  and  other  background  colors,  is  necessary.  Probably 
the  best  way  to  do  this  is  with  a color  television  camera  with  the  capa- 
bility of  selecting  one  range  of  colors  and  rejecting  all  others.  Color 
information  may  also  prove  useful  in  the  identification  process. 

Jagadeesa  [45]  has  recently  developed  a color  television  camera  interface 
for  the  Ohio  State  University  Department  of  Electrical  Engineering 
PDP-9  computer,  and  it  is  f* o be  expected  that  results  of  the  use  of  color 
information  in  this  and  other  image  recognition  problems  will  be  forth- 
coming. The  prefiltering,  edge  extraction,  and  feature  vector  extraction 
could  be  speeded  up  considerably  by  implementing  a binary  array  processor 
in  hardware  rather  than  simulation  [41], 

Besides  hardware,  other  facial  images  should  be  investigated. 

With  the  proper  input  system,  repeatable  binary  images  of  the  front  view 
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of  the  face  should  he  possible.  Such  images  may  have  the  potential 
for  providing  more  information  than  the  profile.  Even  more  infor- 
mation may  t:e  obtained  bv  using  several  binary  images  with  different 
thresholds  to  obtain  a gray  scale.  The  problem  of  automatic  selection 
of  a face  from  a scene  has  been  totally  ignored.  This  problem  may 
have  to  be  solved  before  a practical  recognition  system  can  be  produced. 

This  work  along  with  others  [5,6,7]  has  shown  that  pattern 
recognition  of  two-dimensional  images  by  machine  is  possible  on  a well 
defined  problem  in  a laboratory  environment.  Recognition  accuracies 
in  excess  of  those  achieved  by  humans  have  been  demonstrated.  The 
next  step  is  to  remove  the  machine  from  the  laboratory  and  apply  this 


knowledge  to  develop  systems  for  less  ideal  conditions. 
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APPENDIX  A:  SELECTED  FACIAL  PROFILES 


This  appendix  consists  of  twentv  human  facial  profile  images; 
two  images  per  class.  The  images  were  photographed  from  the  Tektronix 
611  display  after  they  had  been  filtered  to  remove  system  noise  and  pro- 
duce a smoothed  boundary.  Kxceot  for  class  8,  the  images  were  not 
selected  to  show  the  maximum  variation  from  sample  to  sample  within 
a given  class;  they  are  typical  images  and  demonstrate  the  typical 
variations  from  sample  to  sample  for  this  input  system.  Class  8 con- 
tains a reduced  size  facial  profile  that  was  used  to  check  the  recogni- 
tion systems'  size  invariance. 
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APPENDIX  B:  SELECTED  PROFILES  AFTER  EDGE  EXTRACTION 

This  appendix  consists  of  the  same  twenty  facial  profiles  of 
Appendix  A,  but  after  the  front  edge  of  the  profile  has  been  extracted. 
It  can  be  seen  that  the  sample-to-sample  variation  within  a given  class 
is  less  than  that  of  the  full  profile,  mainly  because  of  the  removal  of 
the  hairline.  Width  variations  in  the  extracted  edge  due  to  area 
changes  in  the  full  profile  are  not  noticeable  (see  Section  2.3). 


irinumawi 


Class  7 


Class  9 
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APPENDIX  C:  COMPUTER  PROGRAMS 


This  appendix  contains  the  more  pertinent  programs  written  for 
this  thesis.  The  programs  are  written  in  FORTRAN  IV,  except  for 
three  bit  manipulation  programs  and  a random  number  generator  that  are 
written  in  PDP-9  assembler.  There  are  three  main  routines  in  this 
appendix.  RECOG3  is  an  on-line  facial  profile  recognition  routine. 

Input  is  from  either  the  television  or  a disk  file.  An  image  may  be 
filtered,  stored  in  a disk  file,  or  both.  An  Identification  of  the 
input  image  may  be  requested  and  various  operations  on  the  authority 
file  are  permitted.  RECOGA  is  the  routine  that  was  used  to  generate 
a file  of  feature  vectors  corresponding  to  the  120  facial  profile 
images.  The  images  had  been  filtered  using  REC0G3  before  they  were 
used  with  RECOGA.  RECOG5  is  the  routine  used  to  generate  an  authority 
file  using  either  the  bubble  or  activity  sort  rule.  The  recognition 
accuracy  of  the  authority  file  may  also  be  tested  with  this  routine. 

Because  of  the  experimental  nature  of  this  work,  the  programs 
were  written  as  short  subroutines  that  could  be  called  by  the  various 
main  routines  to  avoid  duplicating  large  quantities  of  code.  Each 
subroutine  was  designed  to  perform  one  well  defined  task.  CRCT1 
corrects  a hardware  flaw  in  the  television  Interface  by  adding  some  bits 
that  are  not  input  and  deleting  some  noise  points  on  the  border  of 
the  image.  FIL0P2  allows  manipulation  of  a vector  file.  The  file 
may  be  listed  or  cleared,  or  a vector  may  be  entered  or  deleted. 
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FLTR2  removes  small  noise  clusters  and  smooths  the  boundary  of  a 
binary  image.  IDRNT2  performs  a distance  weighted  k-nearest  neigh- 
bor classification  of  an  unknown  feature  vector  for  some  specified 
authority  file.  RANDOM  is  a random  number  generator.  S0RT1  is  an 
implementation  of  the  bubble  sort  training  rule.  S0RT2  is  an  im- 
plementation of  the  activity  sort  training  rule.  XFRM3  performs  the 
circular  autocorrelation  function  on  a binary  image,  generating  the 
12-dimenslonal  feature  vector.  XTRCT1  extracts  the  right-hand 
(zero  degree)  edge  of  an  image,  i.e.,  the  front  edge  of  a facial 
profile. 

Three  array  processing  routines  were  written  in  assembler  to 
complement  those  written  by  Miller  [3).  INTRSC  counts  the  number  of 
corresponding  one-cells  in  two  binary  arrays.  INVERT  complements 
every  cell  of  a binary  arrav.  SHIFT  translates  a binary  image  a 
specified  distance  along  the  x and  y axes. 
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